[ 
https://issues.apache.org/jira/browse/SOLR-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16265208#comment-16265208
 ] 

Amrit Sarkar commented on SOLR-11652:
-------------------------------------

I had a chance to chat with [~erickerickson], [~varunthacker] to discuss the 
significance of "buffering" in CDC replication.

Motivation for buffering in CDCR: listed on SOLR-11069 by Renaud:

_The original goal of the buffer on cdcr is to indeed keep indefinitely the 
tlogs until the buffer is deactivated 
(https://lucene.apache.org/solr/guide/7_1/cross-data-center-replication-cdcr.html#the-buffer-element.
 This was useful for example during maintenance operations, to ensure that the 
source cluster will keep all the tlogs until the target clsuter is properly 
initialised. In this scenario, one will activate the buffer on the source. The 
source will start to store all the tlogs (and does not purge them). Once the 
target cluster is initialised, and has register a tlog pointer on the source, 
one can deactivate the buffer on the source and the tlog will start to be 
purged once they are read by the target cluster._

What I understood looking at the code besides what Renaud explained:

_Buffer is always enabled on non-leader nodes of source. In source DC, sync b/w 
leaders and followers is maintained by buffer. If leader goes down, and someone 
else picks up, it uses bufferLog to determine the current version point._

Essentially buffering was introduced to remind source that no updates has been 
sent over, because target is not ready, or CDCR is not started. The 
LastProcessedVersion for source is -1 when buffer enabled, suggesting no 
updates has been forwarded and it has to keep track of all tlogs. Once 
disabled, it starts to show the correct version which has been replicated to 
target.

In Solr 6.2, Bootstrapping is introduced which very well takes care of the 
above use-case, i.e. Source is up and running and have already received bunch 
of updates / documents and either we have not started CDCR or target is not 
available only until now. Whenever CDC replication is started (action=START 
invoked), Bootstrap is called implicitly, which copies the entire index folder 
(not tlogs) to the target. This is much faster and effective than earlier setup 
where all the updates from the beginning were sent to target linearly in batch 
size defined in the cdcr config. This earlier setup was achieved by Buffering 
(the tlogs from beginning).

Today, if we see the current CDCR documentation page, buffering is "disabled" 
by default in both source and target. We don't see any purpose served by Cdcr 
buffering and it is quite an overhead considering it can take a lot heap space 
(tlogs ptr) and forever retention of tlogs on the disk when enabled. Also 
today, even if we disable buffer from API on source , considering it was 
enabled at startup, tlogs are never purged on leader node of shards of source, 
refer jira: SOLR-11652

We propose to make Buffer state default "DISABLED" in the code 
(CdcrBufferManager) and deprecate its APIs (ENABLE / DISABLE buffer). It will 
still be running for non-leader nodes on source implicitly and no user 
intervention is required whatsoever.

> Cdcr TLogs doesn't get purged for Source collection Leader when Buffer is 
> disabled from CDCR API
> ------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-11652
>                 URL: https://issues.apache.org/jira/browse/SOLR-11652
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Amrit Sarkar
>
> Cdcr transactions logs doesn't get purged on leader EVER when Buffer DISABLED 
> from CDCR API.
> Steps to reproduce:
> 1. Setup source and target collection cluster and START CDCR, BUFFER ENABLED.
> 2. Index bunch of documents into source; make sure we have generated tlogs in 
> decent numbers (>20)
> 3. Disable BUFFER via API on source and keep on indexing
> 4. Tlogs starts to get purges on follower nodes of Source, but Leader keeps 
> on accumulating ever.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to