Sdas0000 opened a new issue, #5609:
URL: https://github.com/apache/couchdb/issues/5609
Some of the replication jobs did not start after
changes_reader_died,{timeout,ibrowse_stream_cleanup} message
## Description
Jul 24 14:36:52 xxxx-xxxx-ro-us-west1-x couchdb[826059]:
ChangesReader process died with reason:
{changes_reader_died,{timeout,ibrowse_stream_cleanup}}
Replication `0c59025774c3cc12f61c49f2e2c02c5d+continuous`
(`https://xxxx-xxxx-master-x.pr-xxxx-xxxx.str.xxxxxxx.com/xxxx_xxxx-300/` ->
`https://xxxx-xxxxx-ro-us-west1-x.pr-xxxxx-xxxxxx.str.xxxxx.com/xxxx_xxxx-300/`)
failed: {changes_reader_died,{{timeout,ibrowse_stream_cleanup}}
When ChangesReader process died on xxxx_xxxx-300 , the _scheduler/jobs
didn't crashed and restarted (Last crashed was on 2025-07-23T15:22:33)
{
"database": "_replicator",
"id": "0c59025774c3cc12f61c49f2e2c02c5d+continuous",
"pid": "<0.1761.7093>",
"source":
"https://xxxx-xxxx-master-x.pr-xxxxx-xxxxx.str.xxxx.com/xxxx_xxxx-300/",
"target":
"https://xxxx-xxxxx-ro-us-west1-x.pr-xxxx-xxxxxx.str.xxxxx.com/xxxx_xxxx-300/",
"user": null,
"doc_id": "xxxxx_xxxx_replication_300",
"info": {
"revisions_checked": 3888719,
"missing_revisions_found": 421066,
"docs_read": 421064,
"docs_written": 421064,
"changes_pending": 0,
"doc_write_failures": 0,
"bulk_get_docs": 421064,
"bulk_get_attempts": 421064,
"checkpointed_source_seq":
"22113348-g1AAAAJveJyl0EEOgjAQBdBGSFx4Fgi1ILKSQ3CBdqgWUoopuNYzuPI2eiVPgBMwbpvUzUwyk_x5GU0ICVVQkxz6C6halM0ouwg4KBl1fBiljbIY4rONhrG3OOsNt6DixuDKcK0xYMWJOEzT1KpAkGpz6nC25ls41jvmn-xQMadKlFjF9QfT7xkmISlkvvMPd8CoE2ZCrOSGDW3PBccfM46mlGc08T_w99cW3GvBfT93r2Yc2xcspbn_gfYDW0XUDA",
"source_seq":
"22113348-g1AAAAJveJyl0EEOgjAQBdBGSFx4Fgi1ILKSQ3CBdqgWUoopuNYzuPI2eiVPgBMwbpvUzUwyk_x5GU0ICVVQkxz6C6halM0ouwg4KBl1fBiljbIY4rONhrG3OOsNt6DixuDKcK0xYMWJOEzT1KpAkGpz6nC25ls41jvmn-xQMadKlFjF9QfT7xkmISlkvvMPd8CoE2ZCrOSGDW3PBccfM46mlGc08T_w99cW3GvBfT93r2Yc2xcspbn_gfYDW0XUDA",
"through_seq":
"22113348-g1AAAAJveJyl0EEOgjAQBdBGSFx4Fgi1ILKSQ3CBdqgWUoopuNYzuPI2eiVPgBMwbpvUzUwyk_x5GU0ICVVQkxz6C6halM0ouwg4KBl1fBiljbIY4rONhrG3OOsNt6DixuDKcK0xYMWJOEzT1KpAkGpz6nC25ls41jvmn-xQMadKlFjF9QfT7xkmISlkvvMPd8CoE2ZCrOSGDW3PBccfM46mlGc08T_w99cW3GvBfT93r2Yc2xcspbn_gfYDW0XUDA"
},
"history": [
{
"timestamp": "2025-07-23T15:22:33Z",
"type": "started"
},
{
"timestamp": "2025-07-23T15:22:33Z",
"type": "crashed",
"reason": "{changes_reader_died,{timeout,ibrowse_stream_cleanup}}"
},
## Steps to Reproduce
## Expected Behaviour
The replication job should be in _active_tasks , may be in 'running' or
'pending' state. But the current issue is it never restarts if it is stopped.
Looks like couchdb checks if _scheduler/docs(jobs) exists it assumes that
_active tasks will be there.
we verified there are new documents added in source which never appears in
target database even waiting for few days.
As per couchdb document "_Changed in version 2.1.0: Because of how the
scheduling replicator works, continuous replication jobs could be periodically
stopped and then started later. When they are not running they will not appear
in the _active_tasks endpoint"
Note: But some time even though when
changes_reader_died,{timeout,ibrowse_stream_cleanup} happens for some database
, the _scheduler/job crashes and restarts and everything becomes normal.
[NOTE]:
To restart the replication for the missing databases, we have to bounce
couchdb on that node. But same issue happens on some other databases on some
other nodes after few weeks.
Is there any other way can we restart replication without bouncing the node?
Note : we tried to Update the failed replication user ID and password with
invalid entries , which we thought it will crash the _scheduler/job for that
replication and will restart after adding correct user ID and password. But It
didn't crashed the replication ( The replications for which _active_tasks was
missed ). This forced us to bounce couchdb on that node to restart replication.
## Your Environment
Couchdb version :
"couchdb":"Welcome","version":"3.4.3","git_sha":"f1a47e66","uuid":"67fc0abd32xxx0c38f75cc627b77411d9f","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"],"vendor":{"name":"The
Apache Software Foundation"}}
* Operating system and version:
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
## Additional Context
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]