nickva commented on issue #4897:
URL: https://github.com/apache/couchdb/issues/4897#issuecomment-3003328785

   @Sdas0000 it seems like periodically some connections get interrupted. But 
the job ran for about 1h12m (8:12 to 9:24). If you have connection logs on 
endpoints and/or the load balancer inspect the termination states. As soon as 
the job crashes it restarts.
   
   Coincidently, we just fixed a bug related to `ibrowse_stream_cleanup` in the 
replicator: https://github.com/apache/couchdb/pull/5555. That's not in a 
release yet, but if you have the ability, could try building that patch on top 
of master and deploy that and see if it reduces the case of that error.
   
   > But sometime the replication do not crash for that database( even though 
timeout happens), and the entries in _active_tasks for that replication does 
not exists. so replication for that database is kind of hung.
   
   If the replication job crashes too often, it will be backed off. You can try 
adjusting `max_history` configuration parameter, that controls how many events 
the replication job will keep: I see it's 7 up in the comments of this issue, 
but you can make it 5 or 6, so it will shorten maximum backoff. Another 
parameters to adjust might be the `timeout` and the `retries_per_request` can 
make them smaller.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to