On Wednesday 09 April 2008 22:20:16 Lars Marowsky-Bree wrote: > On 2008-04-09T20:26:02, Bernd Schubert <[EMAIL PROTECTED]> wrote: > > I still think there is another bug in heartbeat, though. There is simply > > no reason for heartbeat to wait $deadtime on initial startup of the > > heartbeat services, when it knows all heartbeat nodes are are up. > > If I at least could manually force it to online the nodes, I would have > > no problem with an initial-deadtime == deadtime. > > That _should_ work, indeed. If both sides are up, it should proceed > immediately. Do you have autojoin enabled? Which version?
This is 2.1.2, but after quickly grepping through the sources, I think this problem is also in tip. There is simply presently no way to mark a node online until the initial deadtime is over: polled_input_dispatch: check_for_timeouts(); check_comm_isup(); /* See if any nodes or links have timed out */ static void check_for_timeouts(void) [...] if (heartbeat_comm_state != COMM_LINKSUP) { /* * Compute alternative dead_ticks value for very first * dead interval. * * We do this because for some unknown reason * sometimes the network is slow to start working. * Experience indicates that 30 seconds is generally * enough. It would be nice to have a better way to * detect that the network isn't really working, but * I don't know any easy way. * Patches are being accepted ;-) */ dead_ticks = msto_longclock(config->initial_deadtime_ms); [...] mark_node_dead(hip); Then in static void check_comm_isup(void) { struct node_info * hip; int j; int heardfromcount = 0; if (heartbeat_comm_state == COMM_LINKSUP) { return; } if (config->rtjoinconfig != HB_JOIN_NONE && !init_deadtime_passed){ return; } Thanks, Bernd -- Bernd Schubert Q-Leap Networks GmbH _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems