[ https://issues.apache.org/jira/browse/LUCENE-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827721#comment-15827721 ]
Dawid Weiss commented on LUCENE-7639: ------------------------------------- Nothing is wrong with the FST, really. A suffix array is functionally equivalent to a suffix tree, which you could build and encode as an FST. Then any infix matching would be done similarly to suffix array-based lookups. These are all interchangeable concepts. The way we use FSTs in Lucene is just one particular application to solve one particular problem (essentially implement an efficient partial prefix trie), hence my remark on the possibility of using two automata to make an intersection of wildcards possible. As for your patch, I really don't know how much of a problem (in typical use) prefix and infix wildcards are. While leading wildcards are heavily used for many things (suggestions, etc.) I can't quite remember when I needed a prefix or infix wildcard. In other words -- don't know whether there'll be much interest in maintaining this code if it went into Lucene, especially considering the implementation details you mentioned (amount of memory it requires at runtime, separate construction process, etc.). Let's see what others think though. > Use Suffix Arrays for fast search with leading asterisks > -------------------------------------------------------- > > Key: LUCENE-7639 > URL: https://issues.apache.org/jira/browse/LUCENE-7639 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Yakov Sirotkin > Attachments: suffix-array.patch > > > If query term starts with asterisks FST checks all words in the dictionary so > request processing speed falls down. This problem can be solved with Suffix > Array approach. Luckily, Suffix Array can be constructed after Lucene start > from existing index. Unfortunately, Suffix Arrays requires a lot of RAM so we > can use it only when special flag is set: > -Dsolr.suffixArray.enable=true > It is possible to speed up Suffix Array initialization using several > threads, so we can control number of threads with > -Dsolr.suffixArray.initialization_treads_count=5 > This system property can be omitted, the default value is 5. > Attached patch is the suggested implementation for SuffixArray support, it > works for all terms starting with asterisks with at least 3 consequent > non-wildcard characters. This patch do not change search results and affects > only performance issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org