On 01/10/2017 11:59 AM, Chris Walker wrote: > > > On Mon, Jan 9, 2017 at 6:55 PM, Andrew Beekhof <abeek...@redhat.com > <mailto:abeek...@redhat.com>> wrote: > > On Fri, Dec 16, 2016 at 8:52 AM, Chris Walker > <christopher.wal...@gmail.com <mailto:christopher.wal...@gmail.com>> > wrote: > > Thanks for your response Ken. I'm puzzled ... in my case node remain > > UNCLEAN (offline) until dc-deadtime expires, even when both nodes are > up and > > corosync is quorate. > > I'm guessing you're starting both nodes at the same time? > > > The nodes power on at the same time, but hardware discovery can vary by > minutes. > > > > The behaviour you're seeing is arguably a hangover from the multicast > days (in which case corosync wouldn't have had a node list). > > > That makes sense. > > > But since that's not the common case anymore, we could probably > shortcut the timeout if we know the complete node list and see that > they are all online. > > > That would be ideal. It's easy enough to work around this in systemd, > but it seems like the HA stack should be the authority on node status.
I've open a feature request: http://bugs.clusterlabs.org/show_bug.cgi?id=5310 FYI the priority list is long at this point, so no idea when it might be addressed. > Thanks! > Chris > > > > > > I see the following from crmd when I have dc-deadtime=2min > > > > Dec 15 21:34:33 max04 crmd[13791]: notice: Quorum acquired > > Dec 15 21:34:33 max04 crmd[13791]: notice: > pcmk_quorum_notification: Node > > max04[2886730248] - state is now member (was (null)) > > Dec 15 21:34:33 max04 crmd[13791]: notice: > pcmk_quorum_notification: Node > > (null)[2886730249] - state is now member (was (null)) > > Dec 15 21:34:33 max04 crmd[13791]: notice: Notifications disabled > > Dec 15 21:34:33 max04 crmd[13791]: notice: The local CRM is > operational > > Dec 15 21:34:33 max04 crmd[13791]: notice: State transition > S_STARTING -> > > S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ] > > ... > > Dec 15 21:36:33 max05 crmd[10365]: warning: FSA: Input > I_DC_TIMEOUT from > > crm_timer_popped() received in state S_PENDING > > Dec 15 21:36:33 max05 crmd[10365]: notice: State transition > S_ELECTION -> > > S_INTEGRATION [ input=I_ELECTION_DC cause=C_TIMER_POPPED > > origin=election_timeout_popped ] > > Dec 15 21:36:33 max05 crmd[10365]: warning: FSA: Input > I_ELECTION_DC from > > do_election_check() received in state S_INTEGRATION > > Dec 15 21:36:33 max05 crmd[10365]: notice: Notifications disabled > > Dec 15 21:36:33 max04 crmd[13791]: notice: State transition > S_PENDING -> > > S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE > > origin=do_cl_join_finalize_respond ] > > > > only after this do the nodes transition to Online. This is using the > > vanilla RHEL7.2 cluster stack and the following options: > > > > property cib-bootstrap-options: \ > > no-quorum-policy=ignore \ > > default-action-timeout=120s \ > > pe-warn-series-max=1500 \ > > pe-input-series-max=1500 \ > > pe-error-series-max=1500 \ > > stonith-action=poweroff \ > > stonith-timeout=900 \ > > dc-deadtime=2min \ > > maintenance-mode=false \ > > have-watchdog=false \ > > dc-version=1.1.13-10.el7-44eb2dd \ > > cluster-infrastructure=corosync > > > > Thanks again, > > Chris > > > > On Thu, Dec 15, 2016 at 3:26 PM, Ken Gaillot <kgail...@redhat.com > <mailto:kgail...@redhat.com>> wrote: > >> > >> On 12/15/2016 02:00 PM, Chris Walker wrote: > >> > Hello, > >> > > >> > I have a quick question about dc-deadtime. I believe that > Digimer and > >> > others on this list might have already addressed this, but I > want to > >> > make sure I'm not missing something. > >> > > >> > If my understanding is correct, dc-deadtime sets the amount of > time that > >> > must elapse before a cluster is formed (DC is elected, etc), > regardless > >> > of which nodes have joined the cluster. In other words, even > if all > >> > nodes that are explicitly enumerated in the nodelist section have > >> > started Pacemaker, they will still wait dc-deadtime before > forming a > >> > cluster. > >> > > >> > In my case, I have a two-node cluster on which I'd like to allow a > >> > pretty long time (~5 minutes) for both nodes to join before > giving up on > >> > them. However, if they both join quickly, I'd like to proceed > to form a > >> > cluster immediately; I don't want to wait for the full five > minutes to > >> > elapse before forming a cluster. Further, if a node doesn't > respond > >> > within five minutes, I want to fence it and start resources on > the node > >> > that is up. > >> > >> Pacemaker+corosync behaves as you describe by default. > >> > >> dc-deadtime is how long to wait for an election to finish, but if the > >> election finishes sooner than that (i.e. a DC is elected), it stops > >> waiting. It doesn't even wait for all nodes, just a quorum. > >> > >> Also, with startup-fencing=true (the default), any unseen nodes > will be > >> fenced, and the remaining nodes will proceed to host resources. Of > >> course, it needs quorum for this, too. > >> > >> With two nodes, quorum is handled specially, but that's a > different topic. > >> > >> > With Pacemaker/Heartbeat, the initdead parameter did exactly what I > >> > want, but I don't see any way to do this with > Pacemaker/Corosync. From > >> > reading other posts, it looks like people use an external agent > to start > >> > HA daemons once nodes are up ... is this a correct understanding? > >> > > >> > Thanks very much, > >> > Chris _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org