On Thursday 30 October 2008 12:45:58 Bernd Schubert wrote: > Hello, > > earlier this year complained on the heartbeat mailing list about huge > startup times, when deadtime is large (due to initdead >= deadtime): > > http://www.mail-archive.com/linux-ha%40lists.linux-ha.org/msg07801.html > > Finally I found the time to look more detailed into this issue. It is > rather easy to convince heartbeat it is to go online, basically just a > removal in check_comm_isup() of this condition: > > if (config->rtjoinconfig != HB_JOIN_NONE > && !init_deadtime_passed){ > return; > }
I was wrong here, we with a fixed node configuration already have HB_JOIN_NONE set. So the only culprit is crm / pacemaker. > > But then the trouble is with crm, it still refuses to select any of the > nodes as domain controller and so nothing will go online after a system > wide heartbeat shutdown. The reason is quite simple, crm uses a simple > timer to the initial selection. As timeout it then uses getenv(ENV_PREFIX > "initdead") set by heartbeat. See the setting and usage of > election_trigger->period_ms in do_startup(), config_query_callback and > config_query_callback(). > > IMHO using such a simple timer is plain wrong. Actually heartbeat should > tell crm when all cluster nodes have been found and then immediately the DC > should be selected. > Well, actually we could keep the timer, but additionally > also would need to get informed by heartbeat when all cluster nodes are > already online. Then the timer could be stopped and the DC selection could > be done immediately. Is there already a callback from heartbeat when all > nodes are onlined? > > > Thanks, > Bernd -- Bernd Schubert Q-Leap Networks GmbH _______________________________________________ Pacemaker mailing list Pacemaker@clusterlabs.org http://list.clusterlabs.org/mailman/listinfo/pacemaker