[jira] Commented: (COUCHDB-704) Replication can lose checkpoints

Randall Leeds (JIRA) Sat, 16 Oct 2010 11:12:48 -0700

    [ 
https://issues.apache.org/jira/browse/COUCHDB-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921723#action_12921723
 ]


Randall Leeds commented on COUCHDB-704:
---------------------------------------

Filipe,

It's true. This is an edge case, but I have had it happen in production with a 
database that had crawled to *very* slow writes and pull replication. The 
checkpoint code updated the source first and the local document was written, 
but the response was too slow so it was taken as a timeout. When the replicator 
retried the save it got a conflict. Replication crashed and the target was 
never written.

I can imagine other, rare instances where this could occur. It's an edge case, 
but a potentially nasty one.

> Replication can lose checkpoints
> --------------------------------
>
>                 Key: COUCHDB-704
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-704
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 0.11.2, 1.0.1
>            Reporter: Randall Leeds
>            Priority: Minor
>         Attachments: keep_session_id.patch, save-all-rep-checkpoints.patch, 
> whitespace.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> When saving replication checkpoints in the _local/<repid> document the new 
> entry is always pushed onto the _original_ "history" list property that 
> existed at the start of the replication. When any number of things causes the 
> checkpoint to be written to only one of the databases the head of the history 
> list gets out of sync. Subsequent attempts to start this replication must 
> start from the latest common replication log entry in the _original_ history, 
> as though this replication never occurred.
> A better idea is to push every checkpoint onto the history instead of 
> replacing the head on each save.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (COUCHDB-704) Replication can lose checkpoints

Reply via email to