[jira] [Updated] (COUCHDB-1230) Replication slows down over time

Paul Hirst (JIRA) Thu, 21 Jul 2011 03:45:34 -0700

     [ 
https://issues.apache.org/jira/browse/COUCHDB-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Paul Hirst updated COUCHDB-1230:
--------------------------------

    Description: 
I have two databases which were replicated in the past, one is running 1.0.2. I 
shall call this the source database. The other is running 1.1.0, I shall call 
this the target database.

The source and target are bidirectionally replicated using a push and pull 
replication from the target (using a couple of documents in the new _replicator 
database).

The source database is in production and is getting changes applied to it from 
live systems. The target is only participating in replication and isn't being 
used directly by any production systems.

The database has about 50 million documents many of these will have been 
updated a handful of times. The database is about 500G after compaction, but 
the source database is currently at about 900G as it hasn't been compacted for 
a while.

The databases were replicated in the past however this replication was torn 
down when the target was upgraded from 1.0.2 to 1.1.0. When replication was 
reenabled the system wasn't able to pick up were it left off and had to 
reenumerate all the documents again. This process initially started quickly but 
after a while ground to a halt such that the target actually stopped making 
progress against the source database.

I found that restarting replication starts the process running again at a 
decent speed for a while. I did this by deleting and recreating the appropriate 
document in the _replicator database on the target.  

I have graphed the last_seq of the target database against time for about a 
day, noting when replication was manually restarted. I shall try to attach the 
graph if possible. It shows a clear improvement in replication speed after 
restarting replication.

I previously witnessed this behaviour between 1.0.2 databases but didn't grab 
any stats at the time but I don't think it's a new problem.


  was:
I have two databases which were replicated in the past, one is running 1.0.2. I 
shall call this the source database. The other is running 1.1.0, I shall call 
this the target database.

The source and target are bidirectionally replicated using a push and pull 
replication from the target (using a couple of documents in the new _replicator 
database).

The source database is in production and is getting changes applied to it from 
live systems. The target is only participating in replication and it's being 
used directly by any production systems.

The database has about 50 million documents many of these will have been 
updated a handful of times. The database is about 500G after compaction, but 
the source database is currently at about 900G as it hasn't been compacted for 
a while.

The databases were replicated in the past however this replication was torn 
down when the target was upgraded from 1.0.2 to 1.1.0. When replication was 
reenabled the system wasn't able to pick up were it left off and had to 
reenumerate all the documents again. This process initially started quickly but 
after a while ground to a halt such that the target actually stopped making 
progress against the source database.

I found that restarting replication starts the process running again at a 
decent speed for a while. I did this by deleting and recreating the appropriate 
document in the _replicator database on the target.  

I have graphed the last_seq of the target database against time for about a 
day, noting when replication was manually restarted. I shall try to attach the 
graph if possible. It shows a clear improvement in replication speed after 
restarting replication.

I previously witnessed this behaviour between 1.0.2 databases but didn't grab 
any stats at the time but I don't think it's a new problem.



> Replication slows down over time
> --------------------------------
>
>                 Key: COUCHDB-1230
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1230
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 1.0.2, 1.1
>         Environment: Ubuntu 10.04, 
>            Reporter: Paul Hirst
>         Attachments: sequence_number.png
>
>
> I have two databases which were replicated in the past, one is running 1.0.2. 
> I shall call this the source database. The other is running 1.1.0, I shall 
> call this the target database.
> The source and target are bidirectionally replicated using a push and pull 
> replication from the target (using a couple of documents in the new 
> _replicator database).
> The source database is in production and is getting changes applied to it 
> from live systems. The target is only participating in replication and isn't 
> being used directly by any production systems.
> The database has about 50 million documents many of these will have been 
> updated a handful of times. The database is about 500G after compaction, but 
> the source database is currently at about 900G as it hasn't been compacted 
> for a while.
> The databases were replicated in the past however this replication was torn 
> down when the target was upgraded from 1.0.2 to 1.1.0. When replication was 
> reenabled the system wasn't able to pick up were it left off and had to 
> reenumerate all the documents again. This process initially started quickly 
> but after a while ground to a halt such that the target actually stopped 
> making progress against the source database.
> I found that restarting replication starts the process running again at a 
> decent speed for a while. I did this by deleting and recreating the 
> appropriate document in the _replicator database on the target.  
> I have graphed the last_seq of the target database against time for about a 
> day, noting when replication was manually restarted. I shall try to attach 
> the graph if possible. It shows a clear improvement in replication speed 
> after restarting replication.
> I previously witnessed this behaviour between 1.0.2 databases but didn't grab 
> any stats at the time but I don't think it's a new problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (COUCHDB-1230) Replication slows down over time

Reply via email to