Hi Dawid,

The heuristic I used was to encode using the direct-array approach
when more than 1/4 of the array indices would be filled (ie max-label
- min-label / num-labels < 4), and otherwise to use the existing
packed array encoding. I only applied the direct encoding when we
previously would have used the fixed-size arc arrays *and* this
density heuristic was met. I experimented with other "load factors,"
and it's true that the sweet spot varies, but that seemed to lead to a
good compromise across a variety of test cases.

I'm curious how the hot cache you describe would be maintained and
accessed. I know eg that the current implementation has a cache of the
first 128 "root" arcs, which I guess are presumed to be
highly-visited, but I think what you are describing is a more dynamic
cache based on usage? Were you thinking that one would maintain an LRU
cache say? Or was this some offline analysis you did based on access
patterns?

On Thu, Apr 25, 2019 at 4:32 PM Dawid Weiss <dawid.we...@gmail.com> wrote:
>
> Hi Mike,
>
> My experience tells me that in practice it's really difficult to tell
> which nodes should be expanded (where this "cost" of binary lookup
> would significantly outweight a direct offset jump). I had some luck
> in speeding up (very intensive) lookups by creating a hash of [node,
> arc label] => node for those paths which were frequently accessed...
> perhaps such a "hot path" cache would be better (compared to static
> expansion of all outgoing arcs)?
>
> Dawid
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to