nickva commented on issue #4897: URL: https://github.com/apache/couchdb/issues/4897#issuecomment-3003328785
@Sdas0000 it seems like periodically some connections get interrupted. But the job ran for about 1h12m (8:12 to 9:24). If you have connection logs on endpoints and/or the load balancer inspect the termination states. As soon as the job crashes it restarts. Coincidently, we just fixed a bug related to `ibrowse_stream_cleanup` in the replicator: https://github.com/apache/couchdb/pull/5555. That's not in a release yet, but if you have the ability, could try building that patch on top of master and deploy that and see if it reduces the case of that error. > But sometime the replication do not crash for that database( even though timeout happens), and the entries in _active_tasks for that replication does not exists. so replication for that database is kind of hung. If the replication job crashes too often, it will be backed off. You can try adjusting `max_history` configuration parameter, that controls how many events the replication job will keep: I see it's 7 up in the comments of this issue, but you can make it 5 or 6, so it will shorten maximum backoff. Another parameters to adjust might be the `timeout` and the `retries_per_request` can make them smaller. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
