Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()

Baris Kazar Tue, 05 Oct 2021 12:56:54 -0700

Hi Adrien,-
Thanks for taking a look at it and sure, that will be very nice to fix those 
accessors.
It is ok in terms of speed and i want more faster though.
Is there anything else i should look at to help make it faster?
Best regards

________________________________
From: Adrien Grand <[email protected]>
Sent: Tuesday, October 5, 2021 3:18 PM
To: Lucene Users Mailing List <[email protected]>
Cc: Baris Kazar <[email protected]>
Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and 
BulkScorer.score()

Hmm we should fix these access$ accessors by fixing the visibility of some 
fields.

These breakdowns do not necessarily signal that something is wrong. Is the 
query executing fast overall?

On Mon, Oct 4, 2021 at 11:57 PM Baris Kazar 
<[email protected]<mailto:[email protected]>> wrote:
Hi, -
I did more experiments and this time i looked into these methods:
org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()

Lets start with BooleanWeight.bulkScorer() with its call tree and time spent:

BooleanWeight.bulkScorer()
-->> Weight.bulkScorer()
-->>-->> BooleanWeight.scorer()
-->>-->>-->>BooleanWeight.scorerSupplier()
-->>-->>-->>-->> Weight.scorerSupplier()
-->>-->>-->>-->>-->> TermQuery$Termweight.scorer()
-->>-->>-->>-->>-->>-->> 
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.impacts()
-->>-->>-->>-->>-->>-->>-->> 
org.apache.lucene.codecs.lucene84.Lucene84PostingsReader.impacts()
-->>-->>-->>-->>-->>-->>-->>-->> 
org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockImpactsDocEnums.init()
-->>-->>-->>-->>-->>-->>-->>-->>-->>  
org.apache.lucene.codecs.lucene84.Lucene84SkipReader.init()
-->>-->>-->>-->>-->>-->>-->>-->>-->>-->> 
org.apache.lucene.codecs.MultiLevelSkipListReader.init()
-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->> 
org.apache.lucene.codecs.MultiLevelSkipListReader.loadSkipLevels()
-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->> 
org.apache.lucene.store.DataInput.readVLong() (constittutes %100 of 
BooleanWeight.bulkScorer() time here)

Next: BulkScorer.score() with its call tree and time spent:

BulkScorer.score()
-->> Weight$DefaultBulkScorer.score()
-->>-->> Weight$DefaultBulkScorer.scoreAll()
-->>-->>-->> WANDScorer$1.nextDoc()
-->>-->>-->>-->> WANDScorer$1.advance()
-->>-->>-->>-->>-->> WANDScorer.access$300() (constitutes %65 of 
BulkScorer.score() time here)
-->>-->>-->>-->>-->> WANDScorer.access$100() (constitutes %30 of 
BulkScorer.score() time here)
-->>-->>-->>-->>-->> WANDScorer.access$400() (constitutes %5 of 
BulkScorer.score() time here)

Best regards

________________________________
From: Baris Kazar <[email protected]<mailto:[email protected]>>
Sent: Saturday, October 2, 2021 3:14 PM
To: Adrien Grand <[email protected]<mailto:[email protected]>>; Lucene Users 
Mailing List <[email protected]<mailto:[email protected]>>
Cc: Baris Kazar <[email protected]<mailto:[email protected]>>
Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and 
BulkScorer.score()

Hi Adrien,-
Thanks. Let me see next week the components (units, methods) within 
BulkScorer#score to see what takes most time among its called methods.

Jvisualvm reports for a method whole time including the time spent in the 
called methods and when you go down the execution tree it goes until the very 
last called method.

Regarding the second paragraph above:
when will there be too many segments in the Lucene index? i have 1 text field 
and 1 stored (non indexed) field.

I most of the time get a couple of thousands hits and i ask for top 20 of them. 
Could this be leading to
BooleanWeight#bulkScorer spending time?

Both of these units:
BooleanWeight#bulkScorer and BulkScorer#score spend equal amounts of time and 
totally make up
75% of IndexSearcher#search as i mentioned before.

Thanks for the swift reply
I appreciate very much

Best regards
________________________________
From: Adrien Grand <[email protected]<mailto:[email protected]>>
Sent: Saturday, October 2, 2021 1:44:40 AM
To: Lucene Users Mailing List 
<[email protected]<mailto:[email protected]>>
Cc: Baris Kazar <[email protected]<mailto:[email protected]>>
Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and 
BulkScorer.score()

Is your profiler reporting inclusive or exclusive costs for each function? Ie. 
does it exclude time spent in functions that are called within a function? I'm 
asking because it makes total sense for IndexSearcher#search to spend most of 
its time is BulkScorer#score, which coordinates the whole matching+scoring 
process.

Having much time spent in BooleanWeight#bulkScorer is a bit surprising however. 
This suggests that you have too many segments in your index (since the bulk 
scorer needs to be recreated for every segment) or that your average query 
matches a very low number of documents (so that Lucene spends more time 
figuring out how best to find the matches versus actually finding these 
matches).

On Sat, Oct 2, 2021 at 5:57 AM Baris Kazar 
<[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>
 wrote:
Hi,-
 I performance profiled my application via jvisualvm on Java
and saw that 75% of the search process from
org.apache.lucene.search.IndexSearcher.search() are spent on
these units:
org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()
Is there any study or project to speed up these please?

Best regards

--
Adrien

--
Adrien

Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()

Reply via email to