[jira] [Commented] (SOLR-6465) CDCR: fall back to whole-index replication when tlogs are insufficient

Renaud Delbru (JIRA) Wed, 20 Apr 2016 03:38:36 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15249622#comment-15249622
 ]


Renaud Delbru commented on SOLR-6465:
-------------------------------------

It would be great indeed to be able to simplify the code as you proposed if we 
can rely on a bootstrap method. Below are some observations that might be 
useful.

One of the concern I have is related to the default size limit of the update 
logs. By default, it keeps 10 tlog files or 100 records. This will likely be 
too small for providing enough buffer for cdcr, and there might be a risk of a 
continuous cycle of bootstrapping replication. One could increase the values of 
"numRecordsToKeep" and "maxNumLogsToKeep" in solrconfig to accommodate the cdcr 
requirements. But this is an additional parameter that the user needs to take 
into consideration, and make configuration more complex. I am wondering if we 
could find a more appropriate default value for cdcr ?

The issue with increasing limits in the original update log compared to the 
cdcr update log is that the original update log will not clean old tlogs files 
(it will keep all tlogs up to that limit) that are not necessary anymore for 
the replication. For example, if one increase the maxNumLogsToKeep to 100 and 
numRecordsToKeep 1000, then the node will always have 100 tlogs files or 1000 
records in the update logs, even if all of them has been replicated to the 
target clusters. This might cause unexpected issues related to disk space or 
performance.

The CdcrUpdateLog was managing this by allowing a variable size update log that 
removes a tlog when it has been fully replicated. But then this means we go 
back to where we were with all the added management around the cdcr update log, 
i.e., buffer, lastprocessedversion, CdcrLogSynchronizer, ...

h4. Cdcr Buffer

If we get rid of the cdcr update log logic, then we can also get rid of the 
Cdcr Buffer (buffer state, buffer commands, etc.)

h4. CdcrUpdateLog

I am not sure if we can get entirely rid of the CdcrUpdateLog. It includes 
logic such as sub-reader and forward seek that are necessary for sending batch 
updates. Maybe this logic can be moved in the UpdateLog ?

h4. CdcrLogSynchronizer

I think it is safe to get rid of this. In the case where a leader goes down 
while a cdcr reader is forwarding updates, the new leader will likely miss the 
tlogs necessary to resume where the cdcr reader stopped. But in this case, it 
can fall back to bootstrapping.

h4. Tlog Replication

If the tlogs are not replicated during a bootstrap, then tlogs on target will 
not be in synch. Could this cause any issues on the target cluster, e.g., in 
case of a recovery ? 
If the target is itself configured as a source (i.e. daisy chain), this will 
probably cause issues. The update logs will likely contain gaps, and it will be 
very difficult for the source to know that there is a gap. Therefore, it might 
forward incomplete updates. But this might be a feature we could drop, as 
suggested in one of your comment on the cwiki.

> CDCR: fall back to whole-index replication when tlogs are insufficient
> ----------------------------------------------------------------------
>
>                 Key: SOLR-6465
>                 URL: https://issues.apache.org/jira/browse/SOLR-6465
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Yonik Seeley
>         Attachments: SOLR-6465.patch, SOLR-6465.patch
>
>
> When the peer-shard doesn't have transaction logs to forward all the needed 
> updates to bring a peer up to date, we need to fall back to normal 
> replication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6465) CDCR: fall back to whole-index replication when tlogs are insufficient

Reply via email to