GitHub user regiskuckaertz created a discussion: Confusion between pekko.cluster.failure-detector and pekko.remote.watch-failure-detector
Hi. We have had an incident recently where a long GC pause caused the heartbeat to fail. I was surprised because we had such an incident a long time ago and set the property `pekko.cluster.failure-detector.acceptable-heartbeat-pause` for that specific reason (default is 3s). However, the value that was logged in the message `Previous heartbeat was sent [10738] ms ago` did not come close to the acceptable pause we had set. After re-reading the docs, it seems the pause must be specified at `pekko.remote.watch-failure-detector.acceptable-heartbeat-pause` (default is 10s). Both properties appear in the `reference.conf`, respectively under the `cluster` and `remote` projects. I've been looking at the code and now I am no longer sure the former is used anywhere ... except in some deprecated `ClusterClient` module under `cluster-tools`. Can you please help me understand if/when either `pekko.cluster.failure-detector` and `pekko.remote.watch-failure-detector` are used? Would it make sense to drop one or the other, at least update the documentation to make it clearer? GitHub link: https://github.com/apache/pekko/discussions/2657 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
