Maybe what you propose would allow FST.cachedRootArcs to be eliminated?
Basically your algorithm could decide that the root would always be direct
access.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Apr 25, 2019 at 5:18 PM Dawid Weiss <dawid.we...@gmail.com> wrote:

> > I'm curious how the hot cache you describe would be maintained and
> > accessed.
>
> The only gain I was successful at was with static precomputed cache.
> Calculated from a priori data (hot paths) or, in the extreme, the
> entire FST is converted to a hash map. :) This allows very fast
> lookups of any node-arc pair, but is not usable for enumerating
> outgoing arcs (for example)... It's really something we did to
> accelerate millions and millions of lookups over the FST (in an
> otherwise not even Lucene-related data structure). I don't think it'll
> be generally useful.
>
> > I know eg that the current implementation has a cache of the
> > first 128 "root" arcs, which I guess are presumed to be
> > highly-visited, but I think what you are describing is a more dynamic
> > cache based on usage?
>
> I don't think it's an assumption of high-visited ratio... it's just a
> limit so that we don't cache very extreme root node fan-outs. The
> choice of this limit is very arbitrary I'd say...
>
> > Were you thinking that one would maintain an LRU
> > cache say? Or was this some offline analysis you did based on access
> > patterns?
>
> On patterns. So not generally useful.
>
> Don't get me wrong -- try to experiment with that constant-expanded
> array... This can be useful especially for nodes close to the root (if
> they're dense)... So definitely worth looking into.
>
> Dawid
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Reply via email to