Re: Monitoring decisions taken by IndexOrDocValuesQuery
FWIW a related PR was just merged that allows to introspect query execution: https://issues.apache.org/jira/browse/LUCENE-9965. It's different from your use-case though in that it is debugging information for a single query rather than statistical information across lots of user queries (and the approach on that other issue makes things much slower so you wouldn't like to enable it in production). Out of curiosity, what are you doing with this information about which execution path is chosen? On Wed, Jun 9, 2021 at 2:14 PM Egor Moraru wrote: > Hi, > > At my current project we wanted to monitor for a specific field the > fraction of indexed vs doc values queries executed by > IndexOrDocValuesQuery. > > We ended up forking IndexOrDocValuesQuery and passing a listener that > is notified when the query execution path is decided. > > Do you think this is something the community might be interested in? > > Kind regards, > Egor Moraru. > -- Adrien
Re: Potential bug
Yes, i did those and i believe i am at the best level of performance now and it is not bad at all but i want to make it much better. i see like a linear drop in timings when i go lower number of words but let me do that quick study again. Fuzzy search is always expensive but that seems to suit best to my needs. Thanks Diego for these great questions and i already explored them. But thanks again. Best regards On 6/9/21 2:04 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote: I have never used fuzzy search but from the documentation it seems very expensive, and if you do it on 10 terms and 1M documents it seems very very very expensive. Are you using the default 'fuzzyness' parameter? (0.5) - It might end up exploring a lot of documents, did you try to play with that parameter? Have you tried to see how the performance change if you do not use fuzzy (just to see if is fuzzy the introduce the slow down)? Or what happens to performance if you do fuzzy with 1, 2, 5 terms instead of 10? From: java-user@lucene.apache.org At: 06/09/21 18:56:31To: java-user@lucene.apache.org, baris.ka...@oracle.com Subject: Re: Potential bug i cant reveal those details i am very sorry. but it is more than 1 million. let me tell that i have a lot of code that processes results from lucene but the bottle neck is lucene fuzzy search. Best regards On 6/9/21 1:53 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote: How many documents do you have in the index? and can you show an example of query? From: java-user@lucene.apache.org At: 06/09/21 18:33:25To: java-user@lucene.apache.org, baris.ka...@oracle.com Subject: Re: Potential bug i have only two fields one string the other is a number (stored as string), i guess you cant go simpler than this. i retreieve the hits and my major bottleneck is lucene fuzzy search. i take each word from the string which is usually around at most 10 words i build a fuzzy boolean query out of them. simple query is like this 10 word query. limit means i want to stop lucene search around 20 hits i dont want thousands of hits. Best regards On 6/9/21 1:25 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote: Hi Baris, what if the user needs to limit the search process? What do you mean by 'limit'? there should be a way to speedup lucene then if this is not possible, since for some simple queries it takes half a second which is too long. What do you mean by 'simple' query? there might be multiple reasons behind slowness of a query that are unrelated to the search (for example, if you retrieve many documents and for each document you are extracting the content of many fields) - would you like to tell us a bit more about your use case? Regards, Diego From: java-user@lucene.apache.org At: 06/09/21 18:18:01To: java-user@lucene.apache.org Cc: baris.ka...@oracle.com Subject: Re: Potential bug Thanks Adrien, but the differences is too far apart. I think the algorithm needs to be revised. what if the user needs to limit the search process? that leaves no control. there should be a way to speedup lucene then if this is not possible, since for some simple queries it takes half a second which is too long. Best regards On 6/9/21 1:13 PM, Adrien Grand wrote: Hi Baris, totalhitsThreshold is actually a minimum threshold, not a maximum threshold. The problem is that Lucene cannot directly identify the top matching documents for a given query. The strategy it adopts is to start collecting hits naively in doc ID order and to progressively raise the bar about the minimum score that is required for a hit to be competitive in order to skip non-competitive documents. So it's expected that Lucene still collects 100s or 1000s of hits, even though the collector is configured to only compute the top 10 hits. On Wed, Jun 9, 2021 at 7:07 PM wrote: Hi,- i think this is a potential bug i set this time totalHitsThreshold to 10 and i get totalhits reported as 1655 but i get 10 results in total. I think this suggests that there might be a bug with TopScoreDocCollector algorithm. Best regards - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To uns
Re: Potential bug
I have never used fuzzy search but from the documentation it seems very expensive, and if you do it on 10 terms and 1M documents it seems very very very expensive. Are you using the default 'fuzzyness' parameter? (0.5) - It might end up exploring a lot of documents, did you try to play with that parameter? Have you tried to see how the performance change if you do not use fuzzy (just to see if is fuzzy the introduce the slow down)? Or what happens to performance if you do fuzzy with 1, 2, 5 terms instead of 10? From: java-user@lucene.apache.org At: 06/09/21 18:56:31To: java-user@lucene.apache.org, baris.ka...@oracle.com Subject: Re: Potential bug i cant reveal those details i am very sorry. but it is more than 1 million. let me tell that i have a lot of code that processes results from lucene but the bottle neck is lucene fuzzy search. Best regards On 6/9/21 1:53 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote: > How many documents do you have in the index? > and can you show an example of query? > > > From: java-user@lucene.apache.org At: 06/09/21 18:33:25To: java-user@lucene.apache.org, baris.ka...@oracle.com > Subject: Re: Potential bug > > i have only two fields one string the other is a number (stored as > string), i guess you cant go simpler than this. > > i retreieve the hits and my major bottleneck is lucene fuzzy search. > > > i take each word from the string which is usually around at most 10 words > > i build a fuzzy boolean query out of them. > > > simple query is like this 10 word query. > > > limit means i want to stop lucene search around 20 hits i dont want > thousands of hits. > > > Best regards > > > On 6/9/21 1:25 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote: > >> Hi Baris, >> >>> what if the user needs to limit the search process? >> What do you mean by 'limit'? >> >>> there should be a way to speedup lucene then if this is not possible, >>> since for some simple queries it takes half a second which is too long. >> What do you mean by 'simple' query? there might be multiple reasons behind > slowness of a query that are unrelated to the search (for example, if you > retrieve many documents and for each document you are extracting the content of > many fields) - would you like to tell us a bit more about your use case? >> Regards, >> Diego >> >> From: java-user@lucene.apache.org At: 06/09/21 18:18:01To: > java-user@lucene.apache.org >> Cc: baris.ka...@oracle.com >> Subject: Re: Potential bug >> >> Thanks Adrien, but the differences is too far apart. >> >> I think the algorithm needs to be revised. >> >> >> what if the user needs to limit the search process? >> >> that leaves no control. >> >> there should be a way to speedup lucene then if this is not possible, >> >> since for some simple queries it takes half a second which is too long. >> >> Best regards >> >> >> On 6/9/21 1:13 PM, Adrien Grand wrote: >>> Hi Baris, >>> >>> totalhitsThreshold is actually a minimum threshold, not a maximum threshold. >>> >>> The problem is that Lucene cannot directly identify the top matching >>> documents for a given query. The strategy it adopts is to start collecting >>> hits naively in doc ID order and to progressively raise the bar about the >>> minimum score that is required for a hit to be competitive in order to skip >>> non-competitive documents. So it's expected that Lucene still collects 100s >>> or 1000s of hits, even though the collector is configured to only compute >>> the top 10 hits. >>> >>> On Wed, Jun 9, 2021 at 7:07 PM wrote: >>> Hi,- i think this is a potential bug i set this time totalHitsThreshold to 10 and i get totalhits reported as 1655 but i get 10 results in total. I think this suggests that there might be a bug with TopScoreDocCollector algorithm. Best regards - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Potential bug
i cant reveal those details i am very sorry. but it is more than 1 million. let me tell that i have a lot of code that processes results from lucene but the bottle neck is lucene fuzzy search. Best regards On 6/9/21 1:53 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote: How many documents do you have in the index? and can you show an example of query? From: java-user@lucene.apache.org At: 06/09/21 18:33:25To: java-user@lucene.apache.org, baris.ka...@oracle.com Subject: Re: Potential bug i have only two fields one string the other is a number (stored as string), i guess you cant go simpler than this. i retreieve the hits and my major bottleneck is lucene fuzzy search. i take each word from the string which is usually around at most 10 words i build a fuzzy boolean query out of them. simple query is like this 10 word query. limit means i want to stop lucene search around 20 hits i dont want thousands of hits. Best regards On 6/9/21 1:25 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote: Hi Baris, what if the user needs to limit the search process? What do you mean by 'limit'? there should be a way to speedup lucene then if this is not possible, since for some simple queries it takes half a second which is too long. What do you mean by 'simple' query? there might be multiple reasons behind slowness of a query that are unrelated to the search (for example, if you retrieve many documents and for each document you are extracting the content of many fields) - would you like to tell us a bit more about your use case? Regards, Diego From: java-user@lucene.apache.org At: 06/09/21 18:18:01To: java-user@lucene.apache.org Cc: baris.ka...@oracle.com Subject: Re: Potential bug Thanks Adrien, but the differences is too far apart. I think the algorithm needs to be revised. what if the user needs to limit the search process? that leaves no control. there should be a way to speedup lucene then if this is not possible, since for some simple queries it takes half a second which is too long. Best regards On 6/9/21 1:13 PM, Adrien Grand wrote: Hi Baris, totalhitsThreshold is actually a minimum threshold, not a maximum threshold. The problem is that Lucene cannot directly identify the top matching documents for a given query. The strategy it adopts is to start collecting hits naively in doc ID order and to progressively raise the bar about the minimum score that is required for a hit to be competitive in order to skip non-competitive documents. So it's expected that Lucene still collects 100s or 1000s of hits, even though the collector is configured to only compute the top 10 hits. On Wed, Jun 9, 2021 at 7:07 PM wrote: Hi,- i think this is a potential bug i set this time totalHitsThreshold to 10 and i get totalhits reported as 1655 but i get 10 results in total. I think this suggests that there might be a bug with TopScoreDocCollector algorithm. Best regards - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Potential bug
How many documents do you have in the index? and can you show an example of query? From: java-user@lucene.apache.org At: 06/09/21 18:33:25To: java-user@lucene.apache.org, baris.ka...@oracle.com Subject: Re: Potential bug i have only two fields one string the other is a number (stored as string), i guess you cant go simpler than this. i retreieve the hits and my major bottleneck is lucene fuzzy search. i take each word from the string which is usually around at most 10 words i build a fuzzy boolean query out of them. simple query is like this 10 word query. limit means i want to stop lucene search around 20 hits i dont want thousands of hits. Best regards On 6/9/21 1:25 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote: > Hi Baris, > >> what if the user needs to limit the search process? > What do you mean by 'limit'? > >> there should be a way to speedup lucene then if this is not possible, >> since for some simple queries it takes half a second which is too long. > What do you mean by 'simple' query? there might be multiple reasons behind slowness of a query that are unrelated to the search (for example, if you retrieve many documents and for each document you are extracting the content of many fields) - would you like to tell us a bit more about your use case? > > Regards, > Diego > > From: java-user@lucene.apache.org At: 06/09/21 18:18:01To: java-user@lucene.apache.org > Cc: baris.ka...@oracle.com > Subject: Re: Potential bug > > Thanks Adrien, but the differences is too far apart. > > I think the algorithm needs to be revised. > > > what if the user needs to limit the search process? > > that leaves no control. > > there should be a way to speedup lucene then if this is not possible, > > since for some simple queries it takes half a second which is too long. > > Best regards > > > On 6/9/21 1:13 PM, Adrien Grand wrote: >> Hi Baris, >> >> totalhitsThreshold is actually a minimum threshold, not a maximum threshold. >> >> The problem is that Lucene cannot directly identify the top matching >> documents for a given query. The strategy it adopts is to start collecting >> hits naively in doc ID order and to progressively raise the bar about the >> minimum score that is required for a hit to be competitive in order to skip >> non-competitive documents. So it's expected that Lucene still collects 100s >> or 1000s of hits, even though the collector is configured to only compute >> the top 10 hits. >> >> On Wed, Jun 9, 2021 at 7:07 PM wrote: >> >>> Hi,- >>> >>> i think this is a potential bug >>> >>> >>> i set this time totalHitsThreshold to 10 and i get totalhits reported as >>> 1655 but i get 10 results in total. >>> >>> I think this suggests that there might be a bug with >>> TopScoreDocCollector algorithm. >>> >>> >>> Best regards >>> >>> >>> >>> - >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Potential bug
i have only two fields one string the other is a number (stored as string), i guess you cant go simpler than this. i retreieve the hits and my major bottleneck is lucene fuzzy search. i take each word from the string which is usually around at most 10 words i build a fuzzy boolean query out of them. simple query is like this 10 word query. limit means i want to stop lucene search around 20 hits i dont want thousands of hits. Best regards On 6/9/21 1:25 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote: Hi Baris, what if the user needs to limit the search process? What do you mean by 'limit'? there should be a way to speedup lucene then if this is not possible, since for some simple queries it takes half a second which is too long. What do you mean by 'simple' query? there might be multiple reasons behind slowness of a query that are unrelated to the search (for example, if you retrieve many documents and for each document you are extracting the content of many fields) - would you like to tell us a bit more about your use case? Regards, Diego From: java-user@lucene.apache.org At: 06/09/21 18:18:01To: java-user@lucene.apache.org Cc: baris.ka...@oracle.com Subject: Re: Potential bug Thanks Adrien, but the differences is too far apart. I think the algorithm needs to be revised. what if the user needs to limit the search process? that leaves no control. there should be a way to speedup lucene then if this is not possible, since for some simple queries it takes half a second which is too long. Best regards On 6/9/21 1:13 PM, Adrien Grand wrote: Hi Baris, totalhitsThreshold is actually a minimum threshold, not a maximum threshold. The problem is that Lucene cannot directly identify the top matching documents for a given query. The strategy it adopts is to start collecting hits naively in doc ID order and to progressively raise the bar about the minimum score that is required for a hit to be competitive in order to skip non-competitive documents. So it's expected that Lucene still collects 100s or 1000s of hits, even though the collector is configured to only compute the top 10 hits. On Wed, Jun 9, 2021 at 7:07 PM wrote: Hi,- i think this is a potential bug i set this time totalHitsThreshold to 10 and i get totalhits reported as 1655 but i get 10 results in total. I think this suggests that there might be a bug with TopScoreDocCollector algorithm. Best regards - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Potential bug
Hi Baris, > what if the user needs to limit the search process? What do you mean by 'limit'? > there should be a way to speedup lucene then if this is not possible, > since for some simple queries it takes half a second which is too long. What do you mean by 'simple' query? there might be multiple reasons behind slowness of a query that are unrelated to the search (for example, if you retrieve many documents and for each document you are extracting the content of many fields) - would you like to tell us a bit more about your use case? Regards, Diego From: java-user@lucene.apache.org At: 06/09/21 18:18:01To: java-user@lucene.apache.org Cc: baris.ka...@oracle.com Subject: Re: Potential bug Thanks Adrien, but the differences is too far apart. I think the algorithm needs to be revised. what if the user needs to limit the search process? that leaves no control. there should be a way to speedup lucene then if this is not possible, since for some simple queries it takes half a second which is too long. Best regards On 6/9/21 1:13 PM, Adrien Grand wrote: > Hi Baris, > > totalhitsThreshold is actually a minimum threshold, not a maximum threshold. > > The problem is that Lucene cannot directly identify the top matching > documents for a given query. The strategy it adopts is to start collecting > hits naively in doc ID order and to progressively raise the bar about the > minimum score that is required for a hit to be competitive in order to skip > non-competitive documents. So it's expected that Lucene still collects 100s > or 1000s of hits, even though the collector is configured to only compute > the top 10 hits. > > On Wed, Jun 9, 2021 at 7:07 PM wrote: > >> Hi,- >> >>i think this is a potential bug >> >> >> i set this time totalHitsThreshold to 10 and i get totalhits reported as >> 1655 but i get 10 results in total. >> >> I think this suggests that there might be a bug with >> TopScoreDocCollector algorithm. >> >> >> Best regards >> >> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Potential bug
Thanks Adrien, but the differences is too far apart. I think the algorithm needs to be revised. what if the user needs to limit the search process? that leaves no control. there should be a way to speedup lucene then if this is not possible, since for some simple queries it takes half a second which is too long. Best regards On 6/9/21 1:13 PM, Adrien Grand wrote: Hi Baris, totalhitsThreshold is actually a minimum threshold, not a maximum threshold. The problem is that Lucene cannot directly identify the top matching documents for a given query. The strategy it adopts is to start collecting hits naively in doc ID order and to progressively raise the bar about the minimum score that is required for a hit to be competitive in order to skip non-competitive documents. So it's expected that Lucene still collects 100s or 1000s of hits, even though the collector is configured to only compute the top 10 hits. On Wed, Jun 9, 2021 at 7:07 PM wrote: Hi,- i think this is a potential bug i set this time totalHitsThreshold to 10 and i get totalhits reported as 1655 but i get 10 results in total. I think this suggests that there might be a bug with TopScoreDocCollector algorithm. Best regards - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Potential bug
Hi Baris, totalhitsThreshold is actually a minimum threshold, not a maximum threshold. The problem is that Lucene cannot directly identify the top matching documents for a given query. The strategy it adopts is to start collecting hits naively in doc ID order and to progressively raise the bar about the minimum score that is required for a hit to be competitive in order to skip non-competitive documents. So it's expected that Lucene still collects 100s or 1000s of hits, even though the collector is configured to only compute the top 10 hits. On Wed, Jun 9, 2021 at 7:07 PM wrote: > Hi,- > > i think this is a potential bug > > > i set this time totalHitsThreshold to 10 and i get totalhits reported as > 1655 but i get 10 results in total. > > I think this suggests that there might be a bug with > TopScoreDocCollector algorithm. > > > Best regards > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Adrien
Potential bug
Hi,- i think this is a potential bug i set this time totalHitsThreshold to 10 and i get totalhits reported as 1655 but i get 10 results in total. I think this suggests that there might be a bug with TopScoreDocCollector algorithm. Best regards - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: TopScoreDocCollector class usage
Ok i found it 300 times number of words in the search string but these needs to be precisely documented in the Javadocs i dont want to have trial and error and i guess nobody wants that, either please. Best regards On 6/9/21 12:11 PM, baris.ka...@oracle.com wrote: Hi,- i used this class now before IndexSearher.search api (with collector as 2nd arg) (Please see the "an interesting case" thread before this question) but this time i have a very weird behavior: i used to have 4000+ hits with default TopScoreDocCollector.create(int numHits, ScoreDoc after, int totalHitsThreshold) internal usage in IndexSearcher.search api which is 1000 and i set after as null here. Now when i set totalHitsThreshold and numHits in TopScoreDocCollector.create to 300 i get 12200+ hits now from totalHits object. Something is not right here, right? How can it jump to 3 times when i set totalHitsThreshold as ~ 1/3 of default value of totalHitsThreshold and numHits? Best regards ps. NOTE: The search(org.apache.lucene.search.Query, int) and searchAfter(org.apache.lucene.search.ScoreDoc, org.apache.lucene.search.Query, int) methods are configured to only count top hits accurately up to 1,000 and may return a lower bound of the hit count if the hit count is greater than or equal to 1,000. On queries that match lots of documents, counting the number of hits may take much longer than computing the top hits so this trade-off allows to get some minimal information about the hit count without slowing down search too much. The TopDocs.scoreDocs array is always accurate however. If this behavior doesn't suit your needs, you should create collectors manually with either TopScoreDocCollector.create(int, int) or TopFieldCollector.create(org.apache.lucene.search.Sort, int, int) and call search(Query, Collector). at https://lucene.apache.org/core/8_5_2/core/org/apache/lucene/search/IndexSearcher.html#searchAfter-org.apache.lucene.search.ScoreDoc-org.apache.lucene.search.Query-int- - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
TopScoreDocCollector class usage
Hi,- i used this class now before IndexSearher.search api (with collector as 2nd arg) (Please see the "an interesting case" thread before this question) but this time i have a very weird behavior: i used to have 4000+ hits with default TopScoreDocCollector.create(int numHits, ScoreDoc after, int totalHitsThreshold) internal usage in IndexSearcher.search api which is 1000 and i set after as null here. Now when i set totalHitsThreshold and numHits in TopScoreDocCollector.create to 300 i get 12200+ hits now from totalHits object. Something is not right here, right? How can it jump to 3 times when i set totalHitsThreshold as ~ 1/3 of default value of totalHitsThreshold and numHits? Best regards ps. NOTE: The search(org.apache.lucene.search.Query, int) and searchAfter(org.apache.lucene.search.ScoreDoc, org.apache.lucene.search.Query, int) methods are configured to only count top hits accurately up to 1,000 and may return a lower bound of the hit count if the hit count is greater than or equal to 1,000. On queries that match lots of documents, counting the number of hits may take much longer than computing the top hits so this trade-off allows to get some minimal information about the hit count without slowing down search too much. The TopDocs.scoreDocs array is always accurate however. If this behavior doesn't suit your needs, you should create collectors manually with either TopScoreDocCollector.create(int, int) or TopFieldCollector.create(org.apache.lucene.search.Sort, int, int) and call search(Query, Collector). at https://lucene.apache.org/core/8_5_2/core/org/apache/lucene/search/IndexSearcher.html#searchAfter-org.apache.lucene.search.ScoreDoc-org.apache.lucene.search.Query-int- - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Monitoring decisions taken by IndexOrDocValuesQuery
Hi, At my current project we wanted to monitor for a specific field the fraction of indexed vs doc values queries executed by IndexOrDocValuesQuery. We ended up forking IndexOrDocValuesQuery and passing a listener that is notified when the query execution path is decided. Do you think this is something the community might be interested in? Kind regards, Egor Moraru.