[I] Continuous replication jobs missed from _active_tasks after changes_reader_died,{timeout,ibrowse_stream_cleanup} message appears in the log [couchdb]

via GitHub Thu, 24 Jul 2025 10:08:49 -0700


Sdas0000 opened a new issue, #5609:
URL: https://github.com/apache/couchdb/issues/5609


   Some of the replication jobs did not start after 
changes_reader_died,{timeout,ibrowse_stream_cleanup} message 
   
   
   ## Description
   
   Jul 24 14:36:52 xxxx-xxxx-ro-us-west1-x couchdb[826059]:
   
   ChangesReader process died with reason: 
{changes_reader_died,{timeout,ibrowse_stream_cleanup}}
   Replication `0c59025774c3cc12f61c49f2e2c02c5d+continuous` 
(`https://xxxx-xxxx-master-x.pr-xxxx-xxxx.str.xxxxxxx.com/xxxx_xxxx-300/` -> 
`https://xxxx-xxxxx-ro-us-west1-x.pr-xxxxx-xxxxxx.str.xxxxx.com/xxxx_xxxx-300/`)
 failed: {changes_reader_died,{{timeout,ibrowse_stream_cleanup}}
   
   When ChangesReader process died on xxxx_xxxx-300 , the _scheduler/jobs 
didn't crashed and restarted (Last crashed was on 2025-07-23T15:22:33)
   
    {
         "database": "_replicator",
         "id": "0c59025774c3cc12f61c49f2e2c02c5d+continuous",
         "pid": "<0.1761.7093>",
         "source": 
"https://xxxx-xxxx-master-x.pr-xxxxx-xxxxx.str.xxxx.com/xxxx_xxxx-300/";,
         "target": 
"https://xxxx-xxxxx-ro-us-west1-x.pr-xxxx-xxxxxx.str.xxxxx.com/xxxx_xxxx-300/";,
         "user": null,
         "doc_id": "xxxxx_xxxx_replication_300",
         "info": {
           "revisions_checked": 3888719,
           "missing_revisions_found": 421066,
           "docs_read": 421064,
           "docs_written": 421064,
           "changes_pending": 0,
           "doc_write_failures": 0,
           "bulk_get_docs": 421064,
           "bulk_get_attempts": 421064,
           "checkpointed_source_seq": 
"22113348-g1AAAAJveJyl0EEOgjAQBdBGSFx4Fgi1ILKSQ3CBdqgWUoopuNYzuPI2eiVPgBMwbpvUzUwyk_x5GU0ICVVQkxz6C6halM0ouwg4KBl1fBiljbIY4rONhrG3OOsNt6DixuDKcK0xYMWJOEzT1KpAkGpz6nC25ls41jvmn-xQMadKlFjF9QfT7xkmISlkvvMPd8CoE2ZCrOSGDW3PBccfM46mlGc08T_w99cW3GvBfT93r2Yc2xcspbn_gfYDW0XUDA",
           "source_seq": 
"22113348-g1AAAAJveJyl0EEOgjAQBdBGSFx4Fgi1ILKSQ3CBdqgWUoopuNYzuPI2eiVPgBMwbpvUzUwyk_x5GU0ICVVQkxz6C6halM0ouwg4KBl1fBiljbIY4rONhrG3OOsNt6DixuDKcK0xYMWJOEzT1KpAkGpz6nC25ls41jvmn-xQMadKlFjF9QfT7xkmISlkvvMPd8CoE2ZCrOSGDW3PBccfM46mlGc08T_w99cW3GvBfT93r2Yc2xcspbn_gfYDW0XUDA",
           "through_seq": 
"22113348-g1AAAAJveJyl0EEOgjAQBdBGSFx4Fgi1ILKSQ3CBdqgWUoopuNYzuPI2eiVPgBMwbpvUzUwyk_x5GU0ICVVQkxz6C6halM0ouwg4KBl1fBiljbIY4rONhrG3OOsNt6DixuDKcK0xYMWJOEzT1KpAkGpz6nC25ls41jvmn-xQMadKlFjF9QfT7xkmISlkvvMPd8CoE2ZCrOSGDW3PBccfM46mlGc08T_w99cW3GvBfT93r2Yc2xcspbn_gfYDW0XUDA"
         },
         "history": [
           {
             "timestamp": "2025-07-23T15:22:33Z",
             "type": "started"
           },
           {
             "timestamp": "2025-07-23T15:22:33Z",
             "type": "crashed",
             "reason": "{changes_reader_died,{timeout,ibrowse_stream_cleanup}}"
           },
   
   
   
   ## Steps to Reproduce
   
   
   
   ## Expected Behaviour
   
   The replication job should be in _active_tasks , may be in 'running' or 
'pending' state. But the current issue is it never restarts if it is stopped. 
Looks like couchdb checks if _scheduler/docs(jobs) exists it assumes that 
_active tasks will be there. 
   
   we verified there are new documents added  in source which never appears in 
target database even waiting for few days. 
   
   As per couchdb  document "_Changed in version 2.1.0: Because of how the 
scheduling replicator works, continuous replication jobs could be periodically 
stopped and then started later. When they are not running they will not appear 
in the _active_tasks endpoint"  
   
   Note: But some time  even though when 
changes_reader_died,{timeout,ibrowse_stream_cleanup} happens for some database 
, the _scheduler/job crashes and restarts and everything becomes normal. 
   
   [NOTE]: 
   
   To restart the replication for the missing databases, we have to bounce 
couchdb on that node. But same issue happens on some other databases on some 
other nodes after few weeks. 
   
   Is there any other way can we restart replication without bouncing the node?
   
   Note :  we tried to Update the failed replication user ID and password with 
invalid entries , which we thought it will crash the _scheduler/job for that 
replication and will restart after adding correct user ID and password. But It  
didn't crashed the replication ( The replications for which _active_tasks was 
missed ). This forced us to bounce couchdb on that node to restart replication. 
   
   ## Your Environment
   
   Couchdb version : 
   
   
"couchdb":"Welcome","version":"3.4.3","git_sha":"f1a47e66","uuid":"67fc0abd32xxx0c38f75cc627b77411d9f","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"],"vendor":{"name":"The
 Apache Software Foundation"}}
   
   * Operating system and version:
   
   NAME="Ubuntu"
   VERSION_ID="22.04"
   VERSION="22.04.5 LTS (Jammy Jellyfish)"
   VERSION_CODENAME=jammy
   ID=ubuntu
   
   
   ## Additional Context
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Continuous replication jobs missed from _active_tasks after changes_reader_died,{timeout,ibrowse_stream_cleanup} message appears in the log [couchdb]

Reply via email to