Ok i think you meant something else here.

you are not refering to total number of hits calculation or the mismatch, right?



so to make lucene minimum work to reach the matched docs


TopScoreDocCollector should be used, right?


Let me check this class.

Thanks


On 6/8/21 1:16 PM, baris.ka...@oracle.com wrote:
Adrien my concern is not actually the number mismatch

as i mentioned it is the performance.


seeing those numbers mismatch it seems that lucene is still doing same

amount of work to get results no matter how many results you need in the indexsearcher search api.


i thought i was clear on that.


Lucene should not spend any energy for the count as scoredocs already has that.

But seeing totalhits high number, that worries me as i explained above.


Best regards


On 6/8/21 1:12 PM, Adrien Grand wrote:
If you don't need any information about the total hit count, you could
create a TopScoreDocCollector that has the same value for numHits
and totalHitsThreshold. This way Lucene will spend as little energy as
possible computing the number of matches of the query.

On Tue, Jun 8, 2021 at 6:28 PM <baris.ka...@oracle.com> wrote:

i am currently happy with Lucene performance but i want to understand
and speedup further

by limiting the results concretely. So i still donot know why totalHits
and scoredocs report

different number of hits.


Best regards


On 6/8/21 2:52 AM, Baris Kazar wrote:
my worry is actually about the lucene's performance.

if lucene collects thousands of hits instead of actually n (<<< a
couple of 1000s) hits, then this creates performance issue.

ScoreDoc array is ok as i mentioned ie, it has size n.
i will check count api.

Best regards
------------------------------------------------------------------------
*From:* Adrien Grand <jpou...@gmail.com>
*Sent:* Tuesday, June 8, 2021 2:46 AM
*To:* Lucene Users Mailing List
*Cc:* Baris Kazar
*Subject:* Re: An interesting case
When you call IndexSearcher#search(Query query, int n), there are two
cases:
  - either your query matches n hits or more, and the TopDocs object
will have a ScoreDoc[] array that contains the n best scoring hits
sorted by descending score,
  - or your query matches less then n hits and then the TopDocs object
will have all matches in the ScoreDoc[] array, sorted by descending
score.
In both cases, TopDocs#totalHits gives information about the total
number of matches of the query. On older versions of Lucene (<7.0)
this is an integer that is always accurate, while on more recent
versions of Lucene (>= 8.0) it is a lower bound of the total number of
matches. It typically returns the number of collected documents
indeed, though this is an implementation detail that might change in
the future.

If you want to count the number of matches of a Query precisely, you
can use IndexSearcher#count.

On Tue, Jun 8, 2021 at 7:51 AM <baris.ka...@oracle.com
<mailto:baris.ka...@oracle.com>> wrote:


https://urldefense.com/v3/__https://stackoverflow.com/questions/50368313/relation-between-topdocs-totalhits-and-parameter-n-of-indexsearcher-search__;!!GqivPVa7Brio!LRsX8rEVxyiW7z_x1SgYFeTYHDh861CsGCbMnMgKAuawz8u5_hiRv52XJ08nfvhVHw$
     <
https://urldefense.com/v3/__https://stackoverflow.com/questions/50368313/relation-between-topdocs-totalhits-and-parameter-n-of-indexsearcher-search__;!!GqivPVa7Brio!JjLGw8TaYQcqSC7BtpPSZl5dl-WqgwwcgGFhOqHSUKIsCaTSNpoDvOJjq0BbkQhfpw$

     looks like someone else also had this problem, too.

     Any suggestions please?

     Best regards


     On 6/8/21 1:36 AM, baris.ka...@oracle.com
     <mailto:baris.ka...@oracle.com> wrote:
     > Hi,-
     >
     >  I use IndexSearcher.search API with two parameters like Query
     and int
     > number (i set as 20).
     >
     > However, when i look at the TopDocs object which is the result
     of this
     > above API call
     >
     > i see thousands of hits from totalhits. Is this inaccurate or
     Lucene
     > is doing actually search based on that many results?
     >
     > But when i iterate over result of above API call's scoreDocs
     object i
     > get int number of hits (ie, 20 hits).
     >
     >
     > I am trying to find out why
     org.apache.lucene.search.Topdocs.TotalHits
     > report a number of collected results than
     >
     > the actual number of results. I see on the order of couple of
     > thousands vs 20.
     >
     >
     > Best regards
     >
     >
     >

---------------------------------------------------------------------
     To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
     <mailto:java-user-unsubscr...@lucene.apache.org>
     For additional commands, e-mail: java-user-h...@lucene.apache.org
     <mailto:java-user-h...@lucene.apache.org>



--
Adrien


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to