I have never used fuzzy search but from the documentation it seems very 
expensive, and if you do it on 10 terms and 1M documents it seems very very 
very expensive.

Are you using the default 'fuzzyness' parameter? (0.5) - It might end up 
exploring a lot of documents, did you try to play with that parameter? 

Have you tried to see how the performance change if you do not use fuzzy (just 
to see if is fuzzy the introduce the slow down)? 
Or what happens to performance if you do fuzzy with 1, 2, 5 terms instead of 10?


From: java-user@lucene.apache.org At: 06/09/21 18:56:31To:  
java-user@lucene.apache.org,  baris.ka...@oracle.com
Subject: Re: Potential bug

i cant reveal those details i am very sorry. but it is more than 1 million.

let me tell that i have a lot of code that processes results from lucene 
but the bottle neck is lucene fuzzy search.

Best regards


On 6/9/21 1:53 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote:
> How many documents do you have in the index?
> and can you show an example of query?
>
>
> From: java-user@lucene.apache.org At: 06/09/21 18:33:25To:  
java-user@lucene.apache.org,  baris.ka...@oracle.com
> Subject: Re: Potential bug
>
> i have only two fields one string the other is a number (stored as
> string), i guess you cant go simpler than this.
>
> i retreieve the hits and my major bottleneck is lucene fuzzy search.
>
>
> i take each word from the string which is usually around at most 10 words
>
> i build a fuzzy boolean query out of them.
>
>
> simple query is like this 10 word query.
>
>
> limit means i want to stop lucene search around 20 hits i dont want
> thousands of hits.
>
>
> Best regards
>
>
> On 6/9/21 1:25 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote:
>
>> Hi Baris,
>>
>>> what if the user needs to limit the search process?
>> What do you mean by 'limit'?
>>
>>> there should be a way to speedup lucene then if this is not possible,
>>> since for some simple queries it takes half a second which is too long.
>> What do you mean by 'simple' query? there might be multiple reasons behind
> slowness of a query that are unrelated to the search (for example, if you
> retrieve many documents and for each document you are extracting the content 
of
> many fields) - would you like to tell us a bit more about your use case?
>> Regards,
>> Diego
>>
>> From: java-user@lucene.apache.org At: 06/09/21 18:18:01To:
> java-user@lucene.apache.org
>> Cc:  baris.ka...@oracle.com
>> Subject: Re: Potential bug
>>
>> Thanks Adrien, but the differences is too far apart.
>>
>> I think the algorithm needs to be revised.
>>
>>
>> what if the user needs to limit the search process?
>>
>> that leaves no control.
>>
>> there should be a way to speedup lucene then if this is not possible,
>>
>> since for some simple queries it takes half a second which is too long.
>>
>> Best regards
>>
>>
>> On 6/9/21 1:13 PM, Adrien Grand wrote:
>>> Hi Baris,
>>>
>>> totalhitsThreshold is actually a minimum threshold, not a maximum threshold.
>>>
>>> The problem is that Lucene cannot directly identify the top matching
>>> documents for a given query. The strategy it adopts is to start collecting
>>> hits naively in doc ID order and to progressively raise the bar about the
>>> minimum score that is required for a hit to be competitive in order to skip
>>> non-competitive documents. So it's expected that Lucene still collects 100s
>>> or 1000s of hits, even though the collector is configured to only compute
>>> the top 10 hits.
>>>
>>> On Wed, Jun 9, 2021 at 7:07 PM <baris.ka...@oracle.com> wrote:
>>>
>>>> Hi,-
>>>>
>>>>      i think this is a potential bug
>>>>
>>>>
>>>> i set this time totalHitsThreshold to 10 and i get totalhits reported as
>>>> 1655 but i get 10 results in total.
>>>>
>>>> I think this suggests that there might be a bug with
>>>> TopScoreDocCollector algorithm.
>>>>
>>>>
>>>> Best regards
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>
>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


Reply via email to