Hi Dawid, The heuristic I used was to encode using the direct-array approach when more than 1/4 of the array indices would be filled (ie max-label - min-label / num-labels < 4), and otherwise to use the existing packed array encoding. I only applied the direct encoding when we previously would have used the fixed-size arc arrays *and* this density heuristic was met. I experimented with other "load factors," and it's true that the sweet spot varies, but that seemed to lead to a good compromise across a variety of test cases.
I'm curious how the hot cache you describe would be maintained and accessed. I know eg that the current implementation has a cache of the first 128 "root" arcs, which I guess are presumed to be highly-visited, but I think what you are describing is a more dynamic cache based on usage? Were you thinking that one would maintain an LRU cache say? Or was this some offline analysis you did based on access patterns? On Thu, Apr 25, 2019 at 4:32 PM Dawid Weiss <dawid.we...@gmail.com> wrote: > > Hi Mike, > > My experience tells me that in practice it's really difficult to tell > which nodes should be expanded (where this "cost" of binary lookup > would significantly outweight a direct offset jump). I had some luck > in speeding up (very intensive) lookups by creating a hash of [node, > arc label] => node for those paths which were frequently accessed... > perhaps such a "hot path" cache would be better (compared to static > expansion of all outgoing arcs)? > > Dawid > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org