[jira] [Commented] (LUCENE-7639) Use Suffix Arrays for fast search with leading asterisks

Dawid Weiss (JIRA) Wed, 18 Jan 2017 01:37:01 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827721#comment-15827721
 ]


Dawid Weiss commented on LUCENE-7639:
-------------------------------------

Nothing is wrong with the FST, really. A suffix array is functionally 
equivalent to a suffix tree, which you could build and encode as an FST. Then 
any infix matching would be done similarly to suffix array-based lookups. These 
are all interchangeable concepts. The way we use FSTs in Lucene is just one 
particular application to solve one particular problem (essentially implement 
an efficient partial prefix trie), hence my remark on the possibility of using 
two automata to make an intersection of wildcards possible.

As for your patch, I really don't know how much of a problem (in typical use) 
prefix and infix wildcards are. While leading wildcards are heavily used for 
many things (suggestions, etc.) I can't quite remember when I needed a prefix 
or infix wildcard. 

In other words -- don't know whether there'll be much interest in maintaining 
this code if it went into Lucene, especially considering the implementation 
details you mentioned (amount of memory it requires at runtime, separate 
construction process, etc.). Let's see what others think though.

> Use Suffix Arrays for fast search with leading asterisks
> --------------------------------------------------------
>
>                 Key: LUCENE-7639
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7639
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Yakov Sirotkin
>         Attachments: suffix-array.patch
>
>
> If query term starts with asterisks FST checks all words in the dictionary so 
> request processing speed falls down. This problem can be solved with Suffix 
> Array approach. Luckily, Suffix Array can be constructed after Lucene start 
> from existing index. Unfortunately, Suffix Arrays requires a lot of RAM so we 
> can use it only when special flag is set:
> -Dsolr.suffixArray.enable=true
> It is possible to  speed up Suffix Array initialization using several 
> threads, so we can control number of threads with 
> -Dsolr.suffixArray.initialization_treads_count=5
> This system property can be omitted, the default value is 5.  
> Attached patch is the suggested implementation for SuffixArray support, it 
> works for all terms starting with asterisks with at least 3 consequent 
> non-wildcard characters. This patch do not change search results and  affects 
> only performance issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7639) Use Suffix Arrays for fast search with leading asterisks

Reply via email to