[ 
https://issues.apache.org/jira/browse/LUCENE-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826115#comment-15826115
 ] 

Dawid Weiss commented on LUCENE-7639:
-------------------------------------

bq. [...] and it requires much bigger index

Why is this the case? The inverted FST should be smaller than the suffix array 
and intersecting both should be a viable option to get {{*abc*}} wildcard 
matches?


> Use Suffix Arrays for fast search with leading asterisks
> --------------------------------------------------------
>
>                 Key: LUCENE-7639
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7639
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Yakov Sirotkin
>         Attachments: suffix-array.patch
>
>
> If query term starts with asterisks FST checks all words in the dictionary so 
> request processing speed falls down. This problem can be solved with Suffix 
> Array approach. Luckily, Suffix Array can be constructed after Lucene start 
> from existing index. Unfortunately, Suffix Arrays requires a lot of RAM so we 
> can use it only when special flag is set:
> -Dsolr.suffixArray.enable=true
> It is possible to  speed up Suffix Array initialization using several 
> threads, so we can control number of threads with 
> -Dsolr.suffixArray.initialization_treads_count=5
> This system property can be omitted, the default value is 5.  
> Attached patch is the suggested implementation for SuffixArray support, it 
> works for all terms starting with asterisks with at least 3 consequent 
> non-wildcard characters. This patch do not change search results and  affects 
> only performance issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to