yes i see sometimes 4000+, sometimes 3000+ hits from totalhits.
So TopScoreDocsCollector is working underneath IndexSearcher.search api,
right?
in other words TopScoreDocsCollector will be saving time, right?
Thanks
On 6/8/21 1:27 PM, Adrien Grand wrote:
Yes, for instance if you care about the top 10 hits only, you could
call TopScoreDocsCollector.create(10, null, 10). By default,
IndexSearcher is configured to count at least 1,000 hits, and creates
its top docs collector with TopScoreDocsCollector.create(10, null, 1000).
On Tue, Jun 8, 2021 at 7:19 PM <baris.ka...@oracle.com
<mailto:baris.ka...@oracle.com>> wrote:
Ok i think you meant something else here.
you are not refering to total number of hits calculation or the
mismatch, right?
so to make lucene minimum work to reach the matched docs
TopScoreDocCollector should be used, right?
Let me check this class.
Thanks
On 6/8/21 1:16 PM, baris.ka...@oracle.com
<mailto:baris.ka...@oracle.com> wrote:
> Adrien my concern is not actually the number mismatch
>
> as i mentioned it is the performance.
>
>
> seeing those numbers mismatch it seems that lucene is still
doing same
>
> amount of work to get results no matter how many results you
need in
> the indexsearcher search api.
>
>
> i thought i was clear on that.
>
>
> Lucene should not spend any energy for the count as scoredocs
already
> has that.
>
> But seeing totalhits high number, that worries me as i explained
above.
>
>
> Best regards
>
>
> On 6/8/21 1:12 PM, Adrien Grand wrote:
>> If you don't need any information about the total hit count,
you could
>> create a TopScoreDocCollector that has the same value for numHits
>> and totalHitsThreshold. This way Lucene will spend as little
energy as
>> possible computing the number of matches of the query.
>>
>> On Tue, Jun 8, 2021 at 6:28 PM <baris.ka...@oracle.com
<mailto:baris.ka...@oracle.com>> wrote:
>>
>>> i am currently happy with Lucene performance but i want to
understand
>>> and speedup further
>>>
>>> by limiting the results concretely. So i still donot know why
totalHits
>>> and scoredocs report
>>>
>>> different number of hits.
>>>
>>>
>>> Best regards
>>>
>>>
>>> On 6/8/21 2:52 AM, Baris Kazar wrote:
>>>> my worry is actually about the lucene's performance.
>>>>
>>>> if lucene collects thousands of hits instead of actually n (<<< a
>>>> couple of 1000s) hits, then this creates performance issue.
>>>>
>>>> ScoreDoc array is ok as i mentioned ie, it has size n.
>>>> i will check count api.
>>>>
>>>> Best regards
>>>>
------------------------------------------------------------------------
>>>>
>>>> *From:* Adrien Grand <jpou...@gmail.com
<mailto:jpou...@gmail.com>>
>>>> *Sent:* Tuesday, June 8, 2021 2:46 AM
>>>> *To:* Lucene Users Mailing List
>>>> *Cc:* Baris Kazar
>>>> *Subject:* Re: An interesting case
>>>> When you call IndexSearcher#search(Query query, int n), there
are two
>>>> cases:
>>>> - either your query matches n hits or more, and the TopDocs
object
>>>> will have a ScoreDoc[] array that contains the n best scoring
hits
>>>> sorted by descending score,
>>>> - or your query matches less then n hits and then the
TopDocs object
>>>> will have all matches in the ScoreDoc[] array, sorted by
descending
>>> score.
>>>> In both cases, TopDocs#totalHits gives information about the
total
>>>> number of matches of the query. On older versions of Lucene
(<7.0)
>>>> this is an integer that is always accurate, while on more recent
>>>> versions of Lucene (>= 8.0) it is a lower bound of the total
number of
>>>> matches. It typically returns the number of collected documents
>>>> indeed, though this is an implementation detail that might
change in
>>>> the future.
>>>>
>>>> If you want to count the number of matches of a Query
precisely, you
>>>> can use IndexSearcher#count.
>>>>
>>>> On Tue, Jun 8, 2021 at 7:51 AM <baris.ka...@oracle.com
<mailto:baris.ka...@oracle.com>
>>>> <mailto:baris.ka...@oracle.com
<mailto:baris.ka...@oracle.com>>> wrote:
>>>>
>>>>
>>>
https://urldefense.com/v3/__https://stackoverflow.com/questions/50368313/relation-between-topdocs-totalhits-and-parameter-n-of-indexsearcher-search__;!!GqivPVa7Brio!LRsX8rEVxyiW7z_x1SgYFeTYHDh861CsGCbMnMgKAuawz8u5_hiRv52XJ08nfvhVHw$
<https://urldefense.com/v3/__https://stackoverflow.com/questions/50368313/relation-between-topdocs-totalhits-and-parameter-n-of-indexsearcher-search__;!!GqivPVa7Brio!LRsX8rEVxyiW7z_x1SgYFeTYHDh861CsGCbMnMgKAuawz8u5_hiRv52XJ08nfvhVHw$>
>>>
>>>> <
>>>
https://urldefense.com/v3/__https://stackoverflow.com/questions/50368313/relation-between-topdocs-totalhits-and-parameter-n-of-indexsearcher-search__;!!GqivPVa7Brio!JjLGw8TaYQcqSC7BtpPSZl5dl-WqgwwcgGFhOqHSUKIsCaTSNpoDvOJjq0BbkQhfpw$
<https://urldefense.com/v3/__https://stackoverflow.com/questions/50368313/relation-between-topdocs-totalhits-and-parameter-n-of-indexsearcher-search__;!!GqivPVa7Brio!JjLGw8TaYQcqSC7BtpPSZl5dl-WqgwwcgGFhOqHSUKIsCaTSNpoDvOJjq0BbkQhfpw$>
>>>
>>>>
>>>> looks like someone else also had this problem, too.
>>>>
>>>> Any suggestions please?
>>>>
>>>> Best regards
>>>>
>>>>
>>>> On 6/8/21 1:36 AM, baris.ka...@oracle.com
<mailto:baris.ka...@oracle.com>
>>>> <mailto:baris.ka...@oracle.com
<mailto:baris.ka...@oracle.com>> wrote:
>>>> > Hi,-
>>>> >
>>>> > I use IndexSearcher.search API with two parameters
like Query
>>>> and int
>>>> > number (i set as 20).
>>>> >
>>>> > However, when i look at the TopDocs object which is
the result
>>>> of this
>>>> > above API call
>>>> >
>>>> > i see thousands of hits from totalhits. Is this
inaccurate or
>>>> Lucene
>>>> > is doing actually search based on that many results?
>>>> >
>>>> > But when i iterate over result of above API call's
scoreDocs
>>>> object i
>>>> > get int number of hits (ie, 20 hits).
>>>> >
>>>> >
>>>> > I am trying to find out why
>>>> org.apache.lucene.search.Topdocs.TotalHits
>>>> > report a number of collected results than
>>>> >
>>>> > the actual number of results. I see on the order of
couple of
>>>> > thousands vs 20.
>>>> >
>>>> >
>>>> > Best regards
>>>> >
>>>> >
>>>> >
>>>>
>>>>
---------------------------------------------------------------------
>>>> To unsubscribe, e-mail:
java-user-unsubscr...@lucene.apache.org
<mailto:java-user-unsubscr...@lucene.apache.org>
>>>> <mailto:java-user-unsubscr...@lucene.apache.org
<mailto:java-user-unsubscr...@lucene.apache.org>>
>>>> For additional commands, e-mail:
java-user-h...@lucene.apache.org
<mailto:java-user-h...@lucene.apache.org>
>>>> <mailto:java-user-h...@lucene.apache.org
<mailto:java-user-h...@lucene.apache.org>>
>>>>
>>>>
>>>>
>>>> --
>>>> Adrien
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
<mailto:java-user-unsubscr...@lucene.apache.org>
For additional commands, e-mail: java-user-h...@lucene.apache.org
<mailto:java-user-h...@lucene.apache.org>
--
Adrien