[jira] [Commented] (LUCENE-7639) Use Suffix Arrays for fast search with leading asterisks

Yakov Sirotkin (JIRA) Tue, 17 Jan 2017 06:16:59 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826108#comment-15826108
 ]


Yakov Sirotkin commented on LUCENE-7639:
----------------------------------------

Of cause, index for reversed terms is popular solution for this problem, but it 
doesn't supports cases when we have asterisks at both ends and it requires much 
bigger index. Imagine situation when you have 1 read-only storage for index 
that is used by 10 servers. You can enable suffix array support only at 1 
server with bigger RAM size and route all requests with asterisks to it, 
additional cost for such solution will be pretty small.

Fast construction of suffix array is the most challenging part of  this 
approach and I spent a lot of time trying various algorithms. Unfortunately, 
linear-time algorithms are too complex, usually its initialization takes longer 
than 3-way radix quicksort full run. Also performance of suffix array creation 
is not very important, because current approach transparently uses FST for the 
segments where suffix array construction hasn't been completed yet. And using 
multiple threads for suffix arrays construction allows to reduce initialization 
time to acceptable values. Also 3-way radix quicksort is perfect from RAM usage 
point of view.

> Use Suffix Arrays for fast search with leading asterisks
> --------------------------------------------------------
>
>                 Key: LUCENE-7639
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7639
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Yakov Sirotkin
>         Attachments: suffix-array.patch
>
>
> If query term starts with asterisks FST checks all words in the dictionary so 
> request processing speed falls down. This problem can be solved with Suffix 
> Array approach. Luckily, Suffix Array can be constructed after Lucene start 
> from existing index. Unfortunately, Suffix Arrays requires a lot of RAM so we 
> can use it only when special flag is set:
> -Dsolr.suffixArray.enable=true
> It is possible to  speed up Suffix Array initialization using several 
> threads, so we can control number of threads with 
> -Dsolr.suffixArray.initialization_treads_count=5
> This system property can be omitted, the default value is 5.  
> Attached patch is the suggested implementation for SuffixArray support, it 
> works for all terms starting with asterisks with at least 3 consequent 
> non-wildcard characters. This patch do not change search results and  affects 
> only performance issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7639) Use Suffix Arrays for fast search with leading asterisks

Reply via email to