[discovery] different heartbeats for repository vs connectors

Stefan Egli Fri, 07 Feb 2014 02:39:31 -0800

Hi,

During an offline discussion, Felix brought up the suggestion to lower the 
topology connector's heartbeat frequency. Currently they are sent every 15 or 
30 sec, which might seem a lot - especially as they were way too chatty (which 
is fixed now with SLING-3377).


The main reason for having a high heartbeat frequency is quicker failure 
detection - but it's obviously a trade-off as it increases load.

I would like to get some opinion on to the following proposal:

  *   introduce two different sets of heartbeats, one for repository and one 
for connectors
  *   the repository ones would remain at the current frequency (suggested 
default: 30sec interval, 60sec timeout). The idea is that we would want to 
detect crashes within a cluster rather quickly, more quickly than in the 
topology in general.
  *   the connectors would get a back-off behavior, where initially the values 
are the same (30sec/60sec) but then they send out less frequent heartbeats over 
time, reaching a max (eg 5min). This would have to be controlled by the 
receiving side, ie both sides of the connector have to agree that interval and 
timeout are the same.

I've opened a Jira to track this, please comment there:

https://issues.apache.org/jira/browse/SLING-3382

Thanks,
Cheers,
Stefan

[discovery] different heartbeats for repository vs connectors

Reply via email to