[ https://issues.apache.org/jira/browse/LUCENE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-3289: --------------------------------------- Attachment: LUCENE-3289.patch Patch, I think it's ready to commit! Separately we should think about how suggest module should set these... I left it at "costly but perfect minimization". > FST should allow controlling how hard builder tries to share suffixes > --------------------------------------------------------------------- > > Key: LUCENE-3289 > URL: https://issues.apache.org/jira/browse/LUCENE-3289 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 3.4, 4.0 > > Attachments: LUCENE-3289.patch, LUCENE-3289.patch > > > Today we have a boolean option to the FST builder telling it whether > it should share suffixes. > If you turn this off, building is much faster, uses much less RAM, and > the resulting FST is a prefix trie. But, the FST is larger than it > needs to be. When it's on, the builder maintains a node hash holding > every node seen so far in the FST -- this uses up RAM and slows things > down. > On a dataset that Elmer (see java-user thread "Autocompletion on large > index" on Jul 6 2011) provided (thank you!), which is 1.32 M titles > avg 67.3 chars per title, building with suffix sharing on took 22.5 > seconds, required 1.25 GB heap, and produced 91.6 MB FST. With suffix > sharing off, it was 8.2 seconds, 450 MB heap and 129 MB FST. > I think we should allow this boolean to be shade-of-gray instead: > usually, how well suffixes can share is a function of how far they are > from the end of the string, so, by adding a tunable N to only share > when suffix length < N, we can let caller make reasonable tradeoffs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org