> We should add this (lookup by value, when value is guaranteed to > monotonically increase as the key increases) to our core FST APIs? > It's generically useful in many places ;) I'll open an issue.
The data structure itself should sort of "build itself" if you create an FST with increasing integers because the "shared suffix" should be pushed towards the root anyway, so the only thing would be to correct values on all outgoing arcs (they need to contain the count of leaves on the subtree).... but then, this may be tricky if arc values are vcoded... I'd have to think how to do this. > While FST w/ lookup-by-monotonic-value would work here, I would be > worried about the per hit of that representation vs what There are actually two things: a) performance; you need to descend in the automaton and some bookkeeping to maintain the count of nodes; this adds overhead, b) size; the procedure for storing/ calculating perfect hashes I described requires leaf counts on each arc and these are usually large integers. Even vcoded they bloat the resulting data structure. Dawid --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org