Hi, We have a large index that we divide into X lucene indices - we use lucene 6.5.0. On each of our serving machines serves 8 lucene indices in parallel. We are getting realtime updates to each of these 8 indices. We are seeing a couple of things:
a) When we turn off realtime updates, performance is significantly better. When we turn on realtime updates, due to accumulation of segments - CPU utilization by lucene goes up by at least *3X* [based on profiling]. b) A profile shows that the vast majority of time is being spent in scoring methods even though we are setting *needsScores() to false* in our collectors. We do commit our index frequently and we are roughly at ~25 segments per index - so a total of 8 * 25 ~ 200 segments across all the 8 indices. Changing the number of 8 indices per machine to reduce the number of segments is a significant effort. So, we would like to know if there are ways to improve performance, w.r.t a) & b) i) We have tried some parameters with the merge policy & NRTCachingDirectory and they did not help significantly ii) Since we dont care about lucene level scores, is there a way to completely disable scoring ? Should setting needsScores() to false in our collectors do the trick ? Should we create our own dummy weight/scorer and injecting it into the Query classes ? Thanks Varun
