On 11/23/22 11:49, Patson Luk wrote:
We are testing multiple replica setup here (1 NRT + 1 PULL) and noticed
that CPU consumption for replication is unreasonably high. Profiling shows
that `SolrCore#openNewSearcher` triggered from `IndexFetcher` takes much
more CPU time than the same method triggered from regular commits.

NB: A collection with 1 NRT replica and 1 PULL replica is not fault tolerant.  In the event you lose the NRT replica, the PULL replica cannot become leader, so even if the collection stays online for queries (and I am not sure that it would), it will not be possible to update it.  If you want full fault tolerance, you need at least two replicas that are either NRT or TLOG, and the rest can be PULL.

Debugging shows that when `SolrCore#openNewSearcher` is triggered from
`IndexFetcher`, it opens a new `SegmentReader` for every single fragment
for the updated collection. As a new `IndexWriter`, which keeps a
`ReaderPool`, is instantiated for each replication. And such pool is not
reused nor previous segment readers are carried over.

I suspect that in the case of a commit on NRT, Lucene can re-use SegmentReader instances for segments that did not change, because all of that is entirely at the Lucene level, so the new Lucene searcher knows about existing segments in the old Lucene searcher.

But with replication (which TLOG when not leader and PULL replicas utilize), the files are handled at the Solr level, and then the index is passed to the Lucene level.  Solr does not know that file x is related to an existing segment, because all that is handled at the Lucene level.  Replication *CAN* involve replacing every single file in the index ... so I am pretty sure that Solr must ask Lucene to load the newly replicated index from scratch and not use the existing Lucene searcher, and that means that Lucene must create all new SegmentReader instances.  I don't think Solr can safely ask Lucene to re-use its old searcher on a replicated index.  Even if that is possible, I imagine that implementing it would take some very involved code that might break with every new Lucene version that Solr upgrades to.

Details in this ticket https://issues.apache.org/jira/browse/SOLR-16560.

This should have been discussed on the users mailing list before creating the issue.  I know you are alluding to potential changes to Solr code, but it's not yet time for a dev list discussion on this.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to