subject:"\[jira\] \[Commented\] \(COUCHDB\-1461\) replication timeout and loop"

[jira] [Commented] (COUCHDB-1461) replication timeout and loop

2012-05-25 Thread Benjamin Nortier (JIRA)


[ 
https://issues.apache.org/jira/browse/COUCHDB-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283296#comment-13283296
 ] 

Benjamin Nortier commented on COUCHDB-1461:
---

We have a similar problem, but not with bi-directional replication. When do a 
replication then set up continuous replication with the same source and target 
for N similar databases, which leads to timeouts.

This patch solves it for us.

 replication timeout and loop
 

 Key: COUCHDB-1461
 URL: https://issues.apache.org/jira/browse/COUCHDB-1461
 Project: CouchDB
  Issue Type: Bug
Affects Versions: 1.2, 1.3
Reporter: Benoit Chesneau
 Attachments: 
 12x-0001-Avoid-possible-timeout-initializing-replications.patch, 
 master-0001-Avoid-possible-timeout-initializing-replications.patch, test.py


 When you try to do at the same time a replication in both way, it will 
 timeout then restart after 5s. Sometimes it won't be able to recover well. 
 Adding a sleep between 2 reps is possibly solving it but it shouldn't be 
 needed. 
 Attached is a script using couchdbkit to reproduce the problem. SERVER_URI 
 need to be changed to point to your couchdb node.
 Log:
  09:09:24.016 [info] 127.0.0.1 - - HEAD /testdb1/ 404
 09:09:24.028 [info] 127.0.0.1 - - PUT /testdb1/ 201
 09:09:24.033 [info] 127.0.0.1 - - HEAD /testdb2/ 404
 09:09:24.046 [info] 127.0.0.1 - - PUT /testdb2/ 201
 09:09:24.071 [info] 127.0.0.1 - - GET
 /_replicator/_all_docs?include_docs=true 200
 09:09:28.110 [info] 127.0.0.1 - - PUT /_replicator/rep1 201
 09:09:28.119 [info] 127.0.0.1 - - PUT /_replicator/rep2 201
 09:09:28.121 [info] Attempting to start replication
 `23280770e617f3a82f398b8eca09aaef` (document `rep1`).
 09:09:28.123 [info] Attempting to start replication
 `e42aaea4a0ceb931930834ecf7b79600` (document `rep2`).
 09:09:28.169 [info] 127.0.0.1 - - HEAD /testdb2/ 200
 09:09:28.172 [info] 127.0.0.1 - - GET /testdb2/ 200
 09:09:28.176 [info] 127.0.0.1 - - GET
 /testdb2/_local/e42aaea4a0ceb931930834ecf7b79600 404
 09:09:28.179 [info] 127.0.0.1 - - GET
 /testdb2/_local/f129a5531f82eb089a3e1ca9e80c9ad2 404
 09:09:28.194 [info] Replication `e42aaea4a0ceb931930834ecf7b79600` is using:
4 worker processes
a worker batch size of 500
20 HTTP connections
a connection timeout of 3 milliseconds
10 retries per request
socket options are: [{keepalive,true},{nodelay,false}]
 09:09:28.196 [info] 127.0.0.1 - - GET
 /testdb2/_changes?feed=normalstyle=all_docssince=0heartbeat=1
 200
 09:09:28.202 [info] Document `rep2` triggered replication
 `e42aaea4a0ceb931930834ecf7b79600`
 09:09:28.203 [info] starting new replication
 `e42aaea4a0ceb931930834ecf7b79600` at 0.262.0
 (`http://localhost:15984/testdb2/` - `testdb1`)
 09:09:28.208 [info] 127.0.0.1 - - HEAD /testdb2/ 200
 09:09:28.212 [info] 127.0.0.1 - - GET /testdb2/ 200
 09:09:28.218 [info] 127.0.0.1 - - GET
 /testdb2/_local/23280770e617f3a82f398b8eca09aaef 404
 09:09:28.219 [info] Replication `e42aaea4a0ceb931930834ecf7b79600`
 finished (triggered by document `rep2`)
 09:09:28.223 [info] 127.0.0.1 - - GET
 /testdb2/_local/4b04e1e066f4ad1f988669036080ed9c 404
 09:09:28.225 [info] Replication `23280770e617f3a82f398b8eca09aaef` is using:
4 worker processes
a worker batch size of 500
20 HTTP connections
a connection timeout of 3 milliseconds
10 retries per request
socket options are: [{keepalive,true},{nodelay,false}]
 09:09:58.203 [error] gen_server 0.287.0 terminated with reason: killed
 09:09:58.207 [error] CRASH REPORT Process 0.287.0 with 0 neighbours
 crashed with reason:
 {killed,[{gen_server,terminate,6,[{file,gen_server.erl},{line,737}]},{proc_lib,init_p_do_apply,3,[{file,proc_lib.erl},{line,227}]}]}
 09:09:58.215 [error] Error in replication
 `23280770e617f3a82f398b8eca09aaef` (triggered by document `rep1`):
 timeout
 Restarting replication in 5 seconds.
 09:10:03.223 [info] 127.0.0.1 - - HEAD /testdb2/ 200
 09:10:03.227 [info] 127.0.0.1 - - GET /testdb2/ 200
 09:10:03.231 [info] 127.0.0.1 - - GET
 /testdb2/_local/23280770e617f3a82f398b8eca09aaef 404
 09:10:03.235 [info] 127.0.0.1 - - GET
 /testdb2/_local/4b04e1e066f4ad1f988669036080ed9c 404
 09:10:03.237 [info] Replication `23280770e617f3a82f398b8eca09aaef` is using:
4 worker processes
a worker batch size of 500
20 HTTP connections
a connection timeout of 3 milliseconds
10 retries per request
socket options are: [{keepalive,true},{nodelay,false}]
 09:10:03.244 [info] Document `rep1` triggered replication
 `23280770e617f3a82f398b8eca09aaef`
 09:10:03.245 [info] starting new replication
 `23280770e617f3a82f398b8eca09aaef` at 0.335.0 (`testdb1` -
 `http://localhost:15984/testdb2/`)
 09:10:03.253 [info] Replication

[jira] [Commented] (COUCHDB-1461) replication timeout and loop

2012-05-25 Thread Filipe Manana (JIRA)


[ 
https://issues.apache.org/jira/browse/COUCHDB-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283325#comment-13283325
 ] 

Filipe Manana commented on COUCHDB-1461:


Thanks for testing Benjamin.
I will merge a small variant of that patch soon.

 replication timeout and loop
 

 Key: COUCHDB-1461
 URL: https://issues.apache.org/jira/browse/COUCHDB-1461
 Project: CouchDB
  Issue Type: Bug
Affects Versions: 1.2, 1.3
Reporter: Benoit Chesneau
 Attachments: 
 12x-0001-Avoid-possible-timeout-initializing-replications.patch, 
 master-0001-Avoid-possible-timeout-initializing-replications.patch, test.py


 When you try to do at the same time a replication in both way, it will 
 timeout then restart after 5s. Sometimes it won't be able to recover well. 
 Adding a sleep between 2 reps is possibly solving it but it shouldn't be 
 needed. 
 Attached is a script using couchdbkit to reproduce the problem. SERVER_URI 
 need to be changed to point to your couchdb node.
 Log:
  09:09:24.016 [info] 127.0.0.1 - - HEAD /testdb1/ 404
 09:09:24.028 [info] 127.0.0.1 - - PUT /testdb1/ 201
 09:09:24.033 [info] 127.0.0.1 - - HEAD /testdb2/ 404
 09:09:24.046 [info] 127.0.0.1 - - PUT /testdb2/ 201
 09:09:24.071 [info] 127.0.0.1 - - GET
 /_replicator/_all_docs?include_docs=true 200
 09:09:28.110 [info] 127.0.0.1 - - PUT /_replicator/rep1 201
 09:09:28.119 [info] 127.0.0.1 - - PUT /_replicator/rep2 201
 09:09:28.121 [info] Attempting to start replication
 `23280770e617f3a82f398b8eca09aaef` (document `rep1`).
 09:09:28.123 [info] Attempting to start replication
 `e42aaea4a0ceb931930834ecf7b79600` (document `rep2`).
 09:09:28.169 [info] 127.0.0.1 - - HEAD /testdb2/ 200
 09:09:28.172 [info] 127.0.0.1 - - GET /testdb2/ 200
 09:09:28.176 [info] 127.0.0.1 - - GET
 /testdb2/_local/e42aaea4a0ceb931930834ecf7b79600 404
 09:09:28.179 [info] 127.0.0.1 - - GET
 /testdb2/_local/f129a5531f82eb089a3e1ca9e80c9ad2 404
 09:09:28.194 [info] Replication `e42aaea4a0ceb931930834ecf7b79600` is using:
4 worker processes
a worker batch size of 500
20 HTTP connections
a connection timeout of 3 milliseconds
10 retries per request
socket options are: [{keepalive,true},{nodelay,false}]
 09:09:28.196 [info] 127.0.0.1 - - GET
 /testdb2/_changes?feed=normalstyle=all_docssince=0heartbeat=1
 200
 09:09:28.202 [info] Document `rep2` triggered replication
 `e42aaea4a0ceb931930834ecf7b79600`
 09:09:28.203 [info] starting new replication
 `e42aaea4a0ceb931930834ecf7b79600` at 0.262.0
 (`http://localhost:15984/testdb2/` - `testdb1`)
 09:09:28.208 [info] 127.0.0.1 - - HEAD /testdb2/ 200
 09:09:28.212 [info] 127.0.0.1 - - GET /testdb2/ 200
 09:09:28.218 [info] 127.0.0.1 - - GET
 /testdb2/_local/23280770e617f3a82f398b8eca09aaef 404
 09:09:28.219 [info] Replication `e42aaea4a0ceb931930834ecf7b79600`
 finished (triggered by document `rep2`)
 09:09:28.223 [info] 127.0.0.1 - - GET
 /testdb2/_local/4b04e1e066f4ad1f988669036080ed9c 404
 09:09:28.225 [info] Replication `23280770e617f3a82f398b8eca09aaef` is using:
4 worker processes
a worker batch size of 500
20 HTTP connections
a connection timeout of 3 milliseconds
10 retries per request
socket options are: [{keepalive,true},{nodelay,false}]
 09:09:58.203 [error] gen_server 0.287.0 terminated with reason: killed
 09:09:58.207 [error] CRASH REPORT Process 0.287.0 with 0 neighbours
 crashed with reason:
 {killed,[{gen_server,terminate,6,[{file,gen_server.erl},{line,737}]},{proc_lib,init_p_do_apply,3,[{file,proc_lib.erl},{line,227}]}]}
 09:09:58.215 [error] Error in replication
 `23280770e617f3a82f398b8eca09aaef` (triggered by document `rep1`):
 timeout
 Restarting replication in 5 seconds.
 09:10:03.223 [info] 127.0.0.1 - - HEAD /testdb2/ 200
 09:10:03.227 [info] 127.0.0.1 - - GET /testdb2/ 200
 09:10:03.231 [info] 127.0.0.1 - - GET
 /testdb2/_local/23280770e617f3a82f398b8eca09aaef 404
 09:10:03.235 [info] 127.0.0.1 - - GET
 /testdb2/_local/4b04e1e066f4ad1f988669036080ed9c 404
 09:10:03.237 [info] Replication `23280770e617f3a82f398b8eca09aaef` is using:
4 worker processes
a worker batch size of 500
20 HTTP connections
a connection timeout of 3 milliseconds
10 retries per request
socket options are: [{keepalive,true},{nodelay,false}]
 09:10:03.244 [info] Document `rep1` triggered replication
 `23280770e617f3a82f398b8eca09aaef`
 09:10:03.245 [info] starting new replication
 `23280770e617f3a82f398b8eca09aaef` at 0.335.0 (`testdb1` -
 `http://localhost:15984/testdb2/`)
 09:10:03.253 [info] Replication `23280770e617f3a82f398b8eca09aaef`
 finished (triggered by document `rep1`)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please

[jira] [Commented] (COUCHDB-1461) replication timeout and loop

2012-05-25 Thread Benjamin Nortier (JIRA)


[ 
https://issues.apache.org/jira/browse/COUCHDB-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283327#comment-13283327
 ] 

Benjamin Nortier commented on COUCHDB-1461:
---

Lovely

 replication timeout and loop
 

 Key: COUCHDB-1461
 URL: https://issues.apache.org/jira/browse/COUCHDB-1461
 Project: CouchDB
  Issue Type: Bug
Affects Versions: 1.2, 1.3
Reporter: Benoit Chesneau
 Attachments: 
 12x-0001-Avoid-possible-timeout-initializing-replications.patch, 
 master-0001-Avoid-possible-timeout-initializing-replications.patch, test.py


 When you try to do at the same time a replication in both way, it will 
 timeout then restart after 5s. Sometimes it won't be able to recover well. 
 Adding a sleep between 2 reps is possibly solving it but it shouldn't be 
 needed. 
 Attached is a script using couchdbkit to reproduce the problem. SERVER_URI 
 need to be changed to point to your couchdb node.
 Log:
  09:09:24.016 [info] 127.0.0.1 - - HEAD /testdb1/ 404
 09:09:24.028 [info] 127.0.0.1 - - PUT /testdb1/ 201
 09:09:24.033 [info] 127.0.0.1 - - HEAD /testdb2/ 404
 09:09:24.046 [info] 127.0.0.1 - - PUT /testdb2/ 201
 09:09:24.071 [info] 127.0.0.1 - - GET
 /_replicator/_all_docs?include_docs=true 200
 09:09:28.110 [info] 127.0.0.1 - - PUT /_replicator/rep1 201
 09:09:28.119 [info] 127.0.0.1 - - PUT /_replicator/rep2 201
 09:09:28.121 [info] Attempting to start replication
 `23280770e617f3a82f398b8eca09aaef` (document `rep1`).
 09:09:28.123 [info] Attempting to start replication
 `e42aaea4a0ceb931930834ecf7b79600` (document `rep2`).
 09:09:28.169 [info] 127.0.0.1 - - HEAD /testdb2/ 200
 09:09:28.172 [info] 127.0.0.1 - - GET /testdb2/ 200
 09:09:28.176 [info] 127.0.0.1 - - GET
 /testdb2/_local/e42aaea4a0ceb931930834ecf7b79600 404
 09:09:28.179 [info] 127.0.0.1 - - GET
 /testdb2/_local/f129a5531f82eb089a3e1ca9e80c9ad2 404
 09:09:28.194 [info] Replication `e42aaea4a0ceb931930834ecf7b79600` is using:
4 worker processes
a worker batch size of 500
20 HTTP connections
a connection timeout of 3 milliseconds
10 retries per request
socket options are: [{keepalive,true},{nodelay,false}]
 09:09:28.196 [info] 127.0.0.1 - - GET
 /testdb2/_changes?feed=normalstyle=all_docssince=0heartbeat=1
 200
 09:09:28.202 [info] Document `rep2` triggered replication
 `e42aaea4a0ceb931930834ecf7b79600`
 09:09:28.203 [info] starting new replication
 `e42aaea4a0ceb931930834ecf7b79600` at 0.262.0
 (`http://localhost:15984/testdb2/` - `testdb1`)
 09:09:28.208 [info] 127.0.0.1 - - HEAD /testdb2/ 200
 09:09:28.212 [info] 127.0.0.1 - - GET /testdb2/ 200
 09:09:28.218 [info] 127.0.0.1 - - GET
 /testdb2/_local/23280770e617f3a82f398b8eca09aaef 404
 09:09:28.219 [info] Replication `e42aaea4a0ceb931930834ecf7b79600`
 finished (triggered by document `rep2`)
 09:09:28.223 [info] 127.0.0.1 - - GET
 /testdb2/_local/4b04e1e066f4ad1f988669036080ed9c 404
 09:09:28.225 [info] Replication `23280770e617f3a82f398b8eca09aaef` is using:
4 worker processes
a worker batch size of 500
20 HTTP connections
a connection timeout of 3 milliseconds
10 retries per request
socket options are: [{keepalive,true},{nodelay,false}]
 09:09:58.203 [error] gen_server 0.287.0 terminated with reason: killed
 09:09:58.207 [error] CRASH REPORT Process 0.287.0 with 0 neighbours
 crashed with reason:
 {killed,[{gen_server,terminate,6,[{file,gen_server.erl},{line,737}]},{proc_lib,init_p_do_apply,3,[{file,proc_lib.erl},{line,227}]}]}
 09:09:58.215 [error] Error in replication
 `23280770e617f3a82f398b8eca09aaef` (triggered by document `rep1`):
 timeout
 Restarting replication in 5 seconds.
 09:10:03.223 [info] 127.0.0.1 - - HEAD /testdb2/ 200
 09:10:03.227 [info] 127.0.0.1 - - GET /testdb2/ 200
 09:10:03.231 [info] 127.0.0.1 - - GET
 /testdb2/_local/23280770e617f3a82f398b8eca09aaef 404
 09:10:03.235 [info] 127.0.0.1 - - GET
 /testdb2/_local/4b04e1e066f4ad1f988669036080ed9c 404
 09:10:03.237 [info] Replication `23280770e617f3a82f398b8eca09aaef` is using:
4 worker processes
a worker batch size of 500
20 HTTP connections
a connection timeout of 3 milliseconds
10 retries per request
socket options are: [{keepalive,true},{nodelay,false}]
 09:10:03.244 [info] Document `rep1` triggered replication
 `23280770e617f3a82f398b8eca09aaef`
 09:10:03.245 [info] starting new replication
 `23280770e617f3a82f398b8eca09aaef` at 0.335.0 (`testdb1` -
 `http://localhost:15984/testdb2/`)
 09:10:03.253 [info] Replication `23280770e617f3a82f398b8eca09aaef`
 finished (triggered by document `rep1`)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:

[jira] [Commented] (COUCHDB-1461) replication timeout and loop

2012-05-25 Thread Benoit Chesneau (JIRA)


[ 
https://issues.apache.org/jira/browse/COUCHDB-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283328#comment-13283328
 ] 

Benoit Chesneau commented on COUCHDB-1461:
--

@filipe was about to merge this version (since it works in production). What 
will be the change?

 replication timeout and loop
 

 Key: COUCHDB-1461
 URL: https://issues.apache.org/jira/browse/COUCHDB-1461
 Project: CouchDB
  Issue Type: Bug
Affects Versions: 1.2, 1.3
Reporter: Benoit Chesneau
 Attachments: 
 12x-0001-Avoid-possible-timeout-initializing-replications.patch, 
 master-0001-Avoid-possible-timeout-initializing-replications.patch, test.py


 When you try to do at the same time a replication in both way, it will 
 timeout then restart after 5s. Sometimes it won't be able to recover well. 
 Adding a sleep between 2 reps is possibly solving it but it shouldn't be 
 needed. 
 Attached is a script using couchdbkit to reproduce the problem. SERVER_URI 
 need to be changed to point to your couchdb node.
 Log:
  09:09:24.016 [info] 127.0.0.1 - - HEAD /testdb1/ 404
 09:09:24.028 [info] 127.0.0.1 - - PUT /testdb1/ 201
 09:09:24.033 [info] 127.0.0.1 - - HEAD /testdb2/ 404
 09:09:24.046 [info] 127.0.0.1 - - PUT /testdb2/ 201
 09:09:24.071 [info] 127.0.0.1 - - GET
 /_replicator/_all_docs?include_docs=true 200
 09:09:28.110 [info] 127.0.0.1 - - PUT /_replicator/rep1 201
 09:09:28.119 [info] 127.0.0.1 - - PUT /_replicator/rep2 201
 09:09:28.121 [info] Attempting to start replication
 `23280770e617f3a82f398b8eca09aaef` (document `rep1`).
 09:09:28.123 [info] Attempting to start replication
 `e42aaea4a0ceb931930834ecf7b79600` (document `rep2`).
 09:09:28.169 [info] 127.0.0.1 - - HEAD /testdb2/ 200
 09:09:28.172 [info] 127.0.0.1 - - GET /testdb2/ 200
 09:09:28.176 [info] 127.0.0.1 - - GET
 /testdb2/_local/e42aaea4a0ceb931930834ecf7b79600 404
 09:09:28.179 [info] 127.0.0.1 - - GET
 /testdb2/_local/f129a5531f82eb089a3e1ca9e80c9ad2 404
 09:09:28.194 [info] Replication `e42aaea4a0ceb931930834ecf7b79600` is using:
4 worker processes
a worker batch size of 500
20 HTTP connections
a connection timeout of 3 milliseconds
10 retries per request
socket options are: [{keepalive,true},{nodelay,false}]
 09:09:28.196 [info] 127.0.0.1 - - GET
 /testdb2/_changes?feed=normalstyle=all_docssince=0heartbeat=1
 200
 09:09:28.202 [info] Document `rep2` triggered replication
 `e42aaea4a0ceb931930834ecf7b79600`
 09:09:28.203 [info] starting new replication
 `e42aaea4a0ceb931930834ecf7b79600` at 0.262.0
 (`http://localhost:15984/testdb2/` - `testdb1`)
 09:09:28.208 [info] 127.0.0.1 - - HEAD /testdb2/ 200
 09:09:28.212 [info] 127.0.0.1 - - GET /testdb2/ 200
 09:09:28.218 [info] 127.0.0.1 - - GET
 /testdb2/_local/23280770e617f3a82f398b8eca09aaef 404
 09:09:28.219 [info] Replication `e42aaea4a0ceb931930834ecf7b79600`
 finished (triggered by document `rep2`)
 09:09:28.223 [info] 127.0.0.1 - - GET
 /testdb2/_local/4b04e1e066f4ad1f988669036080ed9c 404
 09:09:28.225 [info] Replication `23280770e617f3a82f398b8eca09aaef` is using:
4 worker processes
a worker batch size of 500
20 HTTP connections
a connection timeout of 3 milliseconds
10 retries per request
socket options are: [{keepalive,true},{nodelay,false}]
 09:09:58.203 [error] gen_server 0.287.0 terminated with reason: killed
 09:09:58.207 [error] CRASH REPORT Process 0.287.0 with 0 neighbours
 crashed with reason:
 {killed,[{gen_server,terminate,6,[{file,gen_server.erl},{line,737}]},{proc_lib,init_p_do_apply,3,[{file,proc_lib.erl},{line,227}]}]}
 09:09:58.215 [error] Error in replication
 `23280770e617f3a82f398b8eca09aaef` (triggered by document `rep1`):
 timeout
 Restarting replication in 5 seconds.
 09:10:03.223 [info] 127.0.0.1 - - HEAD /testdb2/ 200
 09:10:03.227 [info] 127.0.0.1 - - GET /testdb2/ 200
 09:10:03.231 [info] 127.0.0.1 - - GET
 /testdb2/_local/23280770e617f3a82f398b8eca09aaef 404
 09:10:03.235 [info] 127.0.0.1 - - GET
 /testdb2/_local/4b04e1e066f4ad1f988669036080ed9c 404
 09:10:03.237 [info] Replication `23280770e617f3a82f398b8eca09aaef` is using:
4 worker processes
a worker batch size of 500
20 HTTP connections
a connection timeout of 3 milliseconds
10 retries per request
socket options are: [{keepalive,true},{nodelay,false}]
 09:10:03.244 [info] Document `rep1` triggered replication
 `23280770e617f3a82f398b8eca09aaef`
 09:10:03.245 [info] starting new replication
 `23280770e617f3a82f398b8eca09aaef` at 0.335.0 (`testdb1` -
 `http://localhost:15984/testdb2/`)
 09:10:03.253 [info] Replication `23280770e617f3a82f398b8eca09aaef`
 finished (triggered by document `rep1`)

--
This message is automatically generated by JIRA.
If you think it was

[jira] [Commented] (COUCHDB-1461) replication timeout and loop

2012-05-25 Thread Filipe Manana (JIRA)


[ 
https://issues.apache.org/jira/browse/COUCHDB-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283383#comment-13283383
 ] 

Filipe Manana commented on COUCHDB-1461:


Ops, only saw you comment now.
It is pushed anyway.

 replication timeout and loop
 

 Key: COUCHDB-1461
 URL: https://issues.apache.org/jira/browse/COUCHDB-1461
 Project: CouchDB
  Issue Type: Bug
Affects Versions: 1.2, 1.3
Reporter: Benoit Chesneau
 Fix For: 1.2.1

 Attachments: 
 12x-0001-Avoid-possible-timeout-initializing-replications.patch, 
 master-0001-Avoid-possible-timeout-initializing-replications.patch, test.py


 When you try to do at the same time a replication in both way, it will 
 timeout then restart after 5s. Sometimes it won't be able to recover well. 
 Adding a sleep between 2 reps is possibly solving it but it shouldn't be 
 needed. 
 Attached is a script using couchdbkit to reproduce the problem. SERVER_URI 
 need to be changed to point to your couchdb node.
 Log:
  09:09:24.016 [info] 127.0.0.1 - - HEAD /testdb1/ 404
 09:09:24.028 [info] 127.0.0.1 - - PUT /testdb1/ 201
 09:09:24.033 [info] 127.0.0.1 - - HEAD /testdb2/ 404
 09:09:24.046 [info] 127.0.0.1 - - PUT /testdb2/ 201
 09:09:24.071 [info] 127.0.0.1 - - GET
 /_replicator/_all_docs?include_docs=true 200
 09:09:28.110 [info] 127.0.0.1 - - PUT /_replicator/rep1 201
 09:09:28.119 [info] 127.0.0.1 - - PUT /_replicator/rep2 201
 09:09:28.121 [info] Attempting to start replication
 `23280770e617f3a82f398b8eca09aaef` (document `rep1`).
 09:09:28.123 [info] Attempting to start replication
 `e42aaea4a0ceb931930834ecf7b79600` (document `rep2`).
 09:09:28.169 [info] 127.0.0.1 - - HEAD /testdb2/ 200
 09:09:28.172 [info] 127.0.0.1 - - GET /testdb2/ 200
 09:09:28.176 [info] 127.0.0.1 - - GET
 /testdb2/_local/e42aaea4a0ceb931930834ecf7b79600 404
 09:09:28.179 [info] 127.0.0.1 - - GET
 /testdb2/_local/f129a5531f82eb089a3e1ca9e80c9ad2 404
 09:09:28.194 [info] Replication `e42aaea4a0ceb931930834ecf7b79600` is using:
4 worker processes
a worker batch size of 500
20 HTTP connections
a connection timeout of 3 milliseconds
10 retries per request
socket options are: [{keepalive,true},{nodelay,false}]
 09:09:28.196 [info] 127.0.0.1 - - GET
 /testdb2/_changes?feed=normalstyle=all_docssince=0heartbeat=1
 200
 09:09:28.202 [info] Document `rep2` triggered replication
 `e42aaea4a0ceb931930834ecf7b79600`
 09:09:28.203 [info] starting new replication
 `e42aaea4a0ceb931930834ecf7b79600` at 0.262.0
 (`http://localhost:15984/testdb2/` - `testdb1`)
 09:09:28.208 [info] 127.0.0.1 - - HEAD /testdb2/ 200
 09:09:28.212 [info] 127.0.0.1 - - GET /testdb2/ 200
 09:09:28.218 [info] 127.0.0.1 - - GET
 /testdb2/_local/23280770e617f3a82f398b8eca09aaef 404
 09:09:28.219 [info] Replication `e42aaea4a0ceb931930834ecf7b79600`
 finished (triggered by document `rep2`)
 09:09:28.223 [info] 127.0.0.1 - - GET
 /testdb2/_local/4b04e1e066f4ad1f988669036080ed9c 404
 09:09:28.225 [info] Replication `23280770e617f3a82f398b8eca09aaef` is using:
4 worker processes
a worker batch size of 500
20 HTTP connections
a connection timeout of 3 milliseconds
10 retries per request
socket options are: [{keepalive,true},{nodelay,false}]
 09:09:58.203 [error] gen_server 0.287.0 terminated with reason: killed
 09:09:58.207 [error] CRASH REPORT Process 0.287.0 with 0 neighbours
 crashed with reason:
 {killed,[{gen_server,terminate,6,[{file,gen_server.erl},{line,737}]},{proc_lib,init_p_do_apply,3,[{file,proc_lib.erl},{line,227}]}]}
 09:09:58.215 [error] Error in replication
 `23280770e617f3a82f398b8eca09aaef` (triggered by document `rep1`):
 timeout
 Restarting replication in 5 seconds.
 09:10:03.223 [info] 127.0.0.1 - - HEAD /testdb2/ 200
 09:10:03.227 [info] 127.0.0.1 - - GET /testdb2/ 200
 09:10:03.231 [info] 127.0.0.1 - - GET
 /testdb2/_local/23280770e617f3a82f398b8eca09aaef 404
 09:10:03.235 [info] 127.0.0.1 - - GET
 /testdb2/_local/4b04e1e066f4ad1f988669036080ed9c 404
 09:10:03.237 [info] Replication `23280770e617f3a82f398b8eca09aaef` is using:
4 worker processes
a worker batch size of 500
20 HTTP connections
a connection timeout of 3 milliseconds
10 retries per request
socket options are: [{keepalive,true},{nodelay,false}]
 09:10:03.244 [info] Document `rep1` triggered replication
 `23280770e617f3a82f398b8eca09aaef`
 09:10:03.245 [info] starting new replication
 `23280770e617f3a82f398b8eca09aaef` at 0.335.0 (`testdb1` -
 `http://localhost:15984/testdb2/`)
 09:10:03.253 [info] Replication `23280770e617f3a82f398b8eca09aaef`
 finished (triggered by document `rep1`)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly,

[jira] [Commented] (COUCHDB-1461) replication timeout and loop

[jira] [Commented] (COUCHDB-1461) replication timeout and loop

[jira] [Commented] (COUCHDB-1461) replication timeout and loop

[jira] [Commented] (COUCHDB-1461) replication timeout and loop

[jira] [Commented] (COUCHDB-1461) replication timeout and loop

5 matches

Site Navigation

Mail list logo

Footer information