GitHub user regiskuckaertz created a discussion: Confusion between 
pekko.cluster.failure-detector and pekko.remote.watch-failure-detector

Hi. We have had an incident recently where a long GC pause caused the heartbeat 
to fail. I was surprised because we had such an incident a long time ago and 
set the property `pekko.cluster.failure-detector.acceptable-heartbeat-pause` 
for that specific reason (default is 3s).

However, the value that was logged in the message `Previous heartbeat was sent 
[10738] ms ago` did not come close to the acceptable pause we had set. After 
re-reading the docs, it seems the pause must be specified at 
`pekko.remote.watch-failure-detector.acceptable-heartbeat-pause` (default is 
10s).

Both properties appear in the `reference.conf`, respectively under the 
`cluster` and `remote` projects. I've been looking at the code and now I am no 
longer sure the former is used anywhere ... except in some deprecated 
`ClusterClient` module under `cluster-tools`.

Can you please help me understand if/when either 
`pekko.cluster.failure-detector` and `pekko.remote.watch-failure-detector` are 
used? Would it make sense to drop one or the other, at least update the 
documentation to make it clearer?

GitHub link: https://github.com/apache/pekko/discussions/2657

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to