[ https://issues.apache.org/jira/browse/SOLR-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16265208#comment-16265208 ]
Amrit Sarkar commented on SOLR-11652: ------------------------------------- I had a chance to chat with [~erickerickson], [~varunthacker] to discuss the significance of "buffering" in CDC replication. Motivation for buffering in CDCR: listed on SOLR-11069 by Renaud: _The original goal of the buffer on cdcr is to indeed keep indefinitely the tlogs until the buffer is deactivated (https://lucene.apache.org/solr/guide/7_1/cross-data-center-replication-cdcr.html#the-buffer-element. This was useful for example during maintenance operations, to ensure that the source cluster will keep all the tlogs until the target clsuter is properly initialised. In this scenario, one will activate the buffer on the source. The source will start to store all the tlogs (and does not purge them). Once the target cluster is initialised, and has register a tlog pointer on the source, one can deactivate the buffer on the source and the tlog will start to be purged once they are read by the target cluster._ What I understood looking at the code besides what Renaud explained: _Buffer is always enabled on non-leader nodes of source. In source DC, sync b/w leaders and followers is maintained by buffer. If leader goes down, and someone else picks up, it uses bufferLog to determine the current version point._ Essentially buffering was introduced to remind source that no updates has been sent over, because target is not ready, or CDCR is not started. The LastProcessedVersion for source is -1 when buffer enabled, suggesting no updates has been forwarded and it has to keep track of all tlogs. Once disabled, it starts to show the correct version which has been replicated to target. In Solr 6.2, Bootstrapping is introduced which very well takes care of the above use-case, i.e. Source is up and running and have already received bunch of updates / documents and either we have not started CDCR or target is not available only until now. Whenever CDC replication is started (action=START invoked), Bootstrap is called implicitly, which copies the entire index folder (not tlogs) to the target. This is much faster and effective than earlier setup where all the updates from the beginning were sent to target linearly in batch size defined in the cdcr config. This earlier setup was achieved by Buffering (the tlogs from beginning). Today, if we see the current CDCR documentation page, buffering is "disabled" by default in both source and target. We don't see any purpose served by Cdcr buffering and it is quite an overhead considering it can take a lot heap space (tlogs ptr) and forever retention of tlogs on the disk when enabled. Also today, even if we disable buffer from API on source , considering it was enabled at startup, tlogs are never purged on leader node of shards of source, refer jira: SOLR-11652 We propose to make Buffer state default "DISABLED" in the code (CdcrBufferManager) and deprecate its APIs (ENABLE / DISABLE buffer). It will still be running for non-leader nodes on source implicitly and no user intervention is required whatsoever. > Cdcr TLogs doesn't get purged for Source collection Leader when Buffer is > disabled from CDCR API > ------------------------------------------------------------------------------------------------ > > Key: SOLR-11652 > URL: https://issues.apache.org/jira/browse/SOLR-11652 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Amrit Sarkar > > Cdcr transactions logs doesn't get purged on leader EVER when Buffer DISABLED > from CDCR API. > Steps to reproduce: > 1. Setup source and target collection cluster and START CDCR, BUFFER ENABLED. > 2. Index bunch of documents into source; make sure we have generated tlogs in > decent numbers (>20) > 3. Disable BUFFER via API on source and keep on indexing > 4. Tlogs starts to get purges on follower nodes of Source, but Leader keeps > on accumulating ever. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org