Maybe what you propose would allow FST.cachedRootArcs to be eliminated? Basically your algorithm could decide that the root would always be direct access.
~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Thu, Apr 25, 2019 at 5:18 PM Dawid Weiss <dawid.we...@gmail.com> wrote: > > I'm curious how the hot cache you describe would be maintained and > > accessed. > > The only gain I was successful at was with static precomputed cache. > Calculated from a priori data (hot paths) or, in the extreme, the > entire FST is converted to a hash map. :) This allows very fast > lookups of any node-arc pair, but is not usable for enumerating > outgoing arcs (for example)... It's really something we did to > accelerate millions and millions of lookups over the FST (in an > otherwise not even Lucene-related data structure). I don't think it'll > be generally useful. > > > I know eg that the current implementation has a cache of the > > first 128 "root" arcs, which I guess are presumed to be > > highly-visited, but I think what you are describing is a more dynamic > > cache based on usage? > > I don't think it's an assumption of high-visited ratio... it's just a > limit so that we don't cache very extreme root node fan-outs. The > choice of this limit is very arbitrary I'd say... > > > Were you thinking that one would maintain an LRU > > cache say? Or was this some offline analysis you did based on access > > patterns? > > On patterns. So not generally useful. > > Don't get me wrong -- try to experiment with that constant-expanded > array... This can be useful especially for nodes close to the root (if > they're dense)... So definitely worth looking into. > > Dawid > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >