Lucene cpu utilization & scoring

Varun Sharma Fri, 20 Aug 2021 11:02:30 -0700

Hi,

We have a large index that we divide into X lucene indices - we use lucene
6.5.0. On each of our serving machines serves 8 lucene indices in parallel.
We are getting realtime updates to each of these 8 indices. We are seeing a
couple of things:


a) When we turn off realtime updates, performance is significantly better.
When we turn on realtime updates, due to accumulation of segments - CPU
utilization by lucene goes up by at least *3X* [based on profiling].

b)  A profile shows that the vast majority of time is being spent in
scoring methods even though we are setting *needsScores() to false* in our
collectors.

We do commit our index frequently and we are roughly at ~25 segments per
index - so a total of 8 * 25 ~ 200 segments across all the 8 indices.

Changing the number of 8 indices per machine to reduce the number of
segments is a significant effort. So, we would like to know if there are
ways to improve performance, w.r.t a) & b)

i) We have tried some parameters with the merge policy &
NRTCachingDirectory and they did not help significantly
ii) Since we dont care about lucene level scores, is there a way to
completely disable scoring ? Should setting needsScores() to false in our
collectors do the trick ? Should we create our own dummy weight/scorer and
injecting it into the Query classes ?

Thanks
Varun

Lucene cpu utilization & scoring

Reply via email to