[ https://issues.apache.org/jira/browse/LUCENE-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-2948: --------------------------------------- Attachment: Results.png Graph showing perf results. > Make var gap terms index a partial prefix trie > ---------------------------------------------- > > Key: LUCENE-2948 > URL: https://issues.apache.org/jira/browse/LUCENE-2948 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 4.0 > > Attachments: LUCENE-2948.patch, LUCENE-2948.patch, LUCENE-2948.patch, > LUCENE-2948_automaton.patch, Results.png > > > Var gap stores (in an FST) the indexed terms (every 32nd term, by > default), minus their non-distinguishing suffixes. > However, often times the resulting FST is "close" to a prefix trie in > some portion of the terms space. > By allowing some nodes of the FST to store all outgoing edges, > including ones that do not lead to an indexed term, and by recording > that this node is then "authoritative" as to what terms exist in the > terms dict from that prefix, we can get some important benefits: > * It becomes possible to know that a certain term prefix cannot > exist in the terms index, which means we can save a disk seek in > some cases (like PK lookup, docFreq, etc.) > * We can query for the next possible prefix in the index, allowing > some MTQs (eg FuzzyQuery) to save disk seeks. > Basically, the terms index is able to answer questions that previously > required seeking/scanning in the terms dict file. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org