Re: [ClusterLabs] controlling cluster behavior on startup

Ken Gaillot Tue, 30 Jan 2024 07:20:50 -0800

On Tue, 2024-01-30 at 13:20 +0000, Walker, Chris wrote:
> >>> However, now it seems to wait that amount of time before it
> elects a
> >>> DC, even when quorum is acquired earlier.  In my log snippet
> below,
> >>> with dc-deadtime 300s,
> >>
> >> The dc-deadtime is not waiting for quorum, but for another DC to
> show
> >> up. If all nodes show up, it can proceed, but otherwise it has to
> wait.
> 
> > I believe all the nodes showed up by 14:17:04, but it still waited
> until 14:19:26 to elect a DC:
> 
> > Jan 29 14:14:25 gopher12 pacemaker-controld  [123697]
> (peer_update_callback)    info: Cluster node gopher12 is now membe 
> (was in unknown state)
> > Jan 29 14:17:04 gopher12 pacemaker-controld  [123697]
> (peer_update_callback)    info: Cluster node gopher11 is now membe 
> (was in unknown state)
> > Jan 29 14:17:04 gopher12 pacemaker-controld  [123697]
> (quorum_notification_cb)  notice: Quorum acquired | membership=54
> members=2
> > Jan 29 14:19:26 gopher12 pacemaker-controld  [123697] (do_log) 
> info: Input I_ELECTION_DC received in state S_ELECTION from
> election_win_cb
> 
> > This is a cluster with 2 nodes, gopher11 and gopher12.
> 
> This is our experience with dc-deadtime too: even if both nodes in
> the cluster show up, dc-deadtime must elapse before the cluster
> starts.  This was discussed on this list a while back (
> https://www.mail-archive.com/users@clusterlabs.org/msg03897.html) and
> an RFE came out of it (
> https://bugs.clusterlabs.org/show_bug.cgi?id=5310).


Ah, I misremembered, I thought we had done that :(

>  
> I’ve worked around this by having an ExecStartPre directive for
> Corosync that does essentially:
>  
> while ! systemctl -H ${peer} is-active corosync; do sleep 5; done
>  
> With this in place, the nodes wait for each other before starting
> Corosync and Pacemaker.  We can then use the default 20s dc-deadtime
> so that the DC election happens quickly once both nodes are up.

That makes sense

> Thanks,
> Chris
>  
> From: Users <users-boun...@clusterlabs.org> on behalf of Faaland,
> Olaf P. via Users <users@clusterlabs.org>
> Date: Monday, January 29, 2024 at 7:46 PM
> To: Ken Gaillot <kgail...@redhat.com>, Cluster Labs - All topics
> related to open-source clustering welcomed <users@clusterlabs.org>
> Cc: Faaland, Olaf P. <faala...@llnl.gov>
> Subject: Re: [ClusterLabs] controlling cluster behavior on startup
> 
> >> However, now it seems to wait that amount of time before it elects
> a
> >> DC, even when quorum is acquired earlier.  In my log snippet
> below,
> >> with dc-deadtime 300s,
> >
> > The dc-deadtime is not waiting for quorum, but for another DC to
> show
> > up. If all nodes show up, it can proceed, but otherwise it has to
> wait.
> 
> I believe all the nodes showed up by 14:17:04, but it still waited
> until 14:19:26 to elect a DC:
> 
> Jan 29 14:14:25 gopher12 pacemaker-controld  [123697]
> (peer_update_callback)    info: Cluster node gopher12 is now membe 
> (was in unknown state)
> Jan 29 14:17:04 gopher12 pacemaker-controld  [123697]
> (peer_update_callback)    info: Cluster node gopher11 is now membe 
> (was in unknown state)
> Jan 29 14:17:04 gopher12 pacemaker-controld  [123697]
> (quorum_notification_cb)  notice: Quorum acquired | membership=54
> members=2
> Jan 29 14:19:26 gopher12 pacemaker-controld  [123697] (do_log)  info:
> Input I_ELECTION_DC received in state S_ELECTION from election_win_cb
> 
> This is a cluster with 2 nodes, gopher11 and gopher12.
> 
> Am I misreading that?
> 
> thanks,
> Olaf
> 
> ________________________________________
> From: Ken Gaillot <kgail...@redhat.com>
> Sent: Monday, January 29, 2024 3:49 PM
> To: Faaland, Olaf P.; Cluster Labs - All topics related to open-
> source clustering welcomed
> Subject: Re: [ClusterLabs] controlling cluster behavior on startup
> 
> On Mon, 2024-01-29 at 22:48 +0000, Faaland, Olaf P. wrote:
> > Thank you, Ken.
> >
> > I changed my configuration management system to put an initial
> > cib.xml into /var/lib/pacemaker/cib/, which sets all the property
> > values I was setting via pcs commands, including dc-deadtime.  I
> > removed those "pcs property set" commands from the ones that are
> run
> > at startup time.
> >
> > That worked in the sense that after Pacemaker start, the node waits
> > my newly specified dc-deadtime of 300s before giving up on the
> > partner node and fencing it, if the partner never appears as a
> > member.
> >
> > However, now it seems to wait that amount of time before it elects
> a
> > DC, even when quorum is acquired earlier.  In my log snippet below,
> > with dc-deadtime 300s,
> 
> The dc-deadtime is not waiting for quorum, but for another DC to show
> up. If all nodes show up, it can proceed, but otherwise it has to
> wait.
> 
> >
> > 14:14:24 Pacemaker starts on gopher12
> > 14:17:04 quorum is acquired
> > 14:19:26 Election Trigger just popped (start time + dc-deadtime
> > seconds)
> > 14:19:26 gopher12 wins the election
> >
> > Is there other configuration that needs to be present in the cib at
> > startup time?
> >
> > thanks,
> > Olaf
> >
> > === log extract using new system of installing partial cib.xml
> before
> > startup
> > Jan 29 14:14:24 gopher12 pacemakerd          [123690]
> > (main)    notice: Starting Pacemaker 2.1.7-1.t4 | build=2.1.7
> > features:agent-manpages ascii-docs compat-2.0 corosync-ge-2
> default-
> > concurrent-fencing generated-manpages monotonic nagios ncurses
> remote
> > systemd
> > Jan 29 14:14:25 gopher12 pacemaker-attrd     [123695]
> > (attrd_start_election_if_needed)  info: Starting an election to
> > determine the writer
> > Jan 29 14:14:25 gopher12 pacemaker-attrd     [123695]
> > (election_check)  info: election-attrd won by local node
> > Jan 29 14:14:25 gopher12 pacemaker-controld  [123697]
> > (peer_update_callback)    info: Cluster node gopher12 is now member
> > (was in unknown state)
> > Jan 29 14:17:04 gopher12 pacemaker-controld  [123697]
> > (quorum_notification_cb)  notice: Quorum acquired | membership=54
> > members=2
> > Jan 29 14:19:26 gopher12 pacemaker-controld  [123697]
> > (crm_timer_popped)        info: Election Trigger just popped |
> > input=I_DC_TIMEOUT time=300000ms
> > Jan 29 14:19:26 gopher12 pacemaker-controld  [123697]
> > (do_log)  warning: Input I_DC_TIMEOUT received in state S_PENDING
> > from crm_timer_popped
> > Jan 29 14:19:26 gopher12 pacemaker-controld  [123697]
> > (do_state_transition)     info: State transition S_PENDING ->
> > S_ELECTION | input=I_DC_TIMEOUT cause=C_TIMER_POPPED
> > origin=crm_timer_popped
> > Jan 29 14:19:26 gopher12 pacemaker-controld  [123697]
> > (election_check)  info: election-DC won by local node
> > Jan 29 14:19:26 gopher12 pacemaker-controld  [123697] (do_log) 
> info:
> > Input I_ELECTION_DC received in state S_ELECTION from
> election_win_cb
> > Jan 29 14:19:26 gopher12 pacemaker-controld  [123697]
> > (do_state_transition)     notice: State transition S_ELECTION ->
> > S_INTEGRATION | input=I_ELECTION_DC cause=C_FSA_INTERNAL
> > origin=election_win_cb
> > Jan 29 14:19:26 gopher12 pacemaker-schedulerd[123696]
> > (recurring_op_for_active)         info: Start 10s-interval monitor
> > for gopher11_zpool on gopher11
> > Jan 29 14:19:26 gopher12 pacemaker-schedulerd[123696]
> > (recurring_op_for_active)         info: Start 10s-interval monitor
> > for gopher12_zpool on gopher12
> >
> >
> > === initial cib.xml contents
> > <cib crm_feature_set="3.19.0" validate-with="pacemaker-3.9"
> epoch="9"
> > num_updates="0" admin_epoch="0" cib-last-written="Mon Jan 29
> 11:07:06
> > 2024" update-origin="gopher12" update-client="root" update-
> > user="root" have-quorum="0" dc-uuid="2">
> >   <configuration>
> >     <crm_config>
> >       <cluster_property_set id="cib-bootstrap-options">
> >         <nvpair id="cib-bootstrap-options-stonith-action"
> > name="stonith-action" value="off"/>
> >         <nvpair id="cib-bootstrap-options-have-watchdog"
> name="have-
> > watchdog" value="false"/>
> >         <nvpair id="cib-bootstrap-options-dc-version" name="dc-
> > version" value="2.1.7-1.t4-2.1.7"/>
> >         <nvpair id="cib-bootstrap-options-cluster-infrastructure"
> > name="cluster-infrastructure" value="corosync"/>
> >         <nvpair id="cib-bootstrap-options-cluster-name"
> > name="cluster-name" value="gopher11"/>
> >         <nvpair id="cib-bootstrap-options-cluster-recheck-inte"
> > name="cluster-recheck-interval" value="60"/>
> >         <nvpair id="cib-bootstrap-options-start-failure-is-fat"
> > name="start-failure-is-fatal" value="false"/>
> >         <nvpair id="cib-bootstrap-options-dc-deadtime" name="dc-
> > deadtime" value="300"/>
> >       </cluster_property_set>
> >     </crm_config>
> >     <nodes>
> >       <node id="1" uname="gopher11"/>
> >       <node id="2" uname="gopher12"/>
> >     </nodes>
> >     <resources/>
> >     <constraints/>
> >   </configuration>
> > </cib>
> >
> > ________________________________________
> > From: Ken Gaillot <kgail...@redhat.com>
> > Sent: Monday, January 29, 2024 10:51 AM
> > To: Cluster Labs - All topics related to open-source clustering
> > welcomed
> > Cc: Faaland, Olaf P.
> > Subject: Re: [ClusterLabs] controlling cluster behavior on startup
> >
> > On Mon, 2024-01-29 at 18:05 +0000, Faaland, Olaf P. via Users
> wrote:
> > > Hi,
> > >
> > > I have configured clusters of node pairs, so each cluster has 2
> > > nodes.  The cluster members are statically defined in
> corosync.conf
> > > before corosync or pacemaker is started, and quorum {two_node: 1}
> > > is
> > > set.
> > >
> > > When both nodes are powered off and I power them on, they do not
> > > start pacemaker at exactly the same time.  The time difference
> may
> > > be
> > > a few minutes depending on other factors outside the nodes.
> > >
> > > My goals are (I call the first node to start pacemaker "node1"):
> > > 1) I want to control how long pacemaker on node1 waits before
> > > fencing
> > > node2 if node2 does not start pacemaker.
> > > 2) If node1 is part-way through that waiting period, and node2
> > > starts
> > > pacemaker so they detect each other, I would like them to proceed
> > > immediately to probing resource state and starting resources
> which
> > > are down, not wait until the end of that "grace period".
> > >
> > > It looks from the documentation like dc-deadtime is how #1 is
> > > controlled, and #2 is expected normal behavior.  However, I'm
> > > seeing
> > > fence actions before dc-deadtime has passed.
> > >
> > > Am I misunderstanding Pacemaker's expected behavior and/or how
> dc-
> > > deadtime should be used?
> >
> > You have everything right. The problem is that you're starting with
> > an
> > empty configuration every time, so the default dc-deadtime is being
> > used for the first election (before you can set the desired value).
> >
> > I can't think of anything you can do to get around that, since the
> > controller starts the timer as soon as it starts up. Would it be
> > possible to bake an initial configuration into the PXE image?
> >
> > When the timer value changes, we could stop the existing timer and
> > restart it. There's a risk that some external automation could make
> > repeated changes to the timeout, thus never letting it expire, but
> > that
> > seems preferable to your problem. I've created an issue for that:
> >
> >
> > https://urldefense.us/v3/__https:/projects.clusterlabs.org/T764 
> >
> > BTW there's also election-timeout. I'm not sure offhand how that
> > interacts; it might be necessary to raise that one as well.
> >
> > > One possibly unusual aspect of this cluster is that these two
> nodes
> > > are stateless - they PXE boot from an image on another server -
> and
> > > I
> > > build the cluster configuration at boot time with a series of pcs
> > > commands, because the nodes have no local storage for this
> > > purpose.  The commands are:
> > >
> > > ['pcs', 'cluster', 'start']
> > > ['pcs', 'property', 'set', 'stonith-action=off']
> > > ['pcs', 'property', 'set', 'cluster-recheck-interval=60']
> > > ['pcs', 'property', 'set', 'start-failure-is-fatal=false']
> > > ['pcs', 'property', 'set', 'dc-deadtime=300']
> > > ['pcs', 'stonith', 'create', 'fence_gopher11', 'fence_powerman',
> > > 'ip=192.168.64.65', 'pcmk_host_check=static-list',
> > > 'pcmk_host_list=gopher11,gopher12']
> > > ['pcs', 'stonith', 'create', 'fence_gopher12', 'fence_powerman',
> > > 'ip=192.168.64.65', 'pcmk_host_check=static-list',
> > > 'pcmk_host_list=gopher11,gopher12']
> > > ['pcs', 'resource', 'create', 'gopher11_zpool', 'ocf:llnl:zpool',
> > > 'import_options="-f -N -d /dev/disk/by-vdev"', 'pool=gopher11',
> > > 'op',
> > > 'start', 'timeout=805']
> > > ...
> > > ['pcs', 'property', 'set', 'no-quorum-policy=ignore']
> >
> > BTW you don't need to change no-quorum-policy when you're using
> > two_node with Corosync.
> >
> > > I could, instead, generate a CIB so that when Pacemaker is
> started,
> > > it has a full config.  Is that better?
> > >
> > > thanks,
> > > Olaf
> > >
> > > === corosync.conf:
> > > totem {
> > >     version: 2
> > >     cluster_name: gopher11
> > >     secauth: off
> > >     transport: udpu
> > > }
> > > nodelist {
> > >     node {
> > >         ring0_addr: gopher11
> > >         name: gopher11
> > >         nodeid: 1
> > >     }
> > >     node {
> > >         ring0_addr: gopher12
> > >         name: gopher12
> > >         nodeid: 2
> > >     }
> > > }
> > > quorum {
> > >     provider: corosync_votequorum
> > >     two_node: 1
> > > }
> > >
> > > === Log excerpt
> > >
> > > Here's an except from Pacemaker logs that reflect what I'm
> > > seeing.  These are from gopher12, the node that came up first. 
> The
> > > other node, which is not yet up, is gopher11.
> > >
> > > Jan 25 17:55:38 gopher12 pacemakerd          [116033]
> > > (main)    notice: Starting Pacemaker 2.1.7-1.t4 | build=2.1.7
> > > features:agent-manpages ascii-docs compat-2.0 corosync-ge-2
> > > default-
> > > concurrent-fencing generated-manpages monotonic nagios ncurses
> > > remote
> > > systemd
> > > Jan 25 17:55:39 gopher12 pacemaker-controld  [116040]
> > > (peer_update_callback)    info: Cluster node gopher12 is now
> member
> > > (was in unknown state)
> > > Jan 25 17:55:43 gopher12 pacemaker-based     [116035]
> > > (cib_perform_op)  info: ++
> > > /cib/configuration/crm_config/cluster_property_set[@id='cib-
> > > bootstrap-options']:  <nvpair id="cib-bootstrap-options-dc-
> > > deadtime"
> > > name="dc-deadtime" value="300"/>
> > > Jan 25 17:56:00 gopher12 pacemaker-controld  [116040]
> > > (crm_timer_popped)        info: Election Trigger just popped |
> > > input=I_DC_TIMEOUT time=300000ms
> > > Jan 25 17:56:01 gopher12 pacemaker-based     [116035]
> > > (cib_perform_op)  info: ++
> > > /cib/configuration/crm_config/cluster_property_set[@id='cib-
> > > bootstrap-options']:  <nvpair id="cib-bootstrap-options-no-
> quorum-
> > > policy" name="no-quorum-policy" value="ignore"/>
> > > Jan 25 17:56:01 gopher12 pacemaker-controld  [116040]
> > > (abort_transition_graph)  info: Transition 0 aborted by cib-
> > > bootstrap-options-no-quorum-policy doing create no-quorum-
> > > policy=ignore: Configuration change | cib=0.26.0
> > > source=te_update_diff_v2:464
> > > path=/cib/configuration/crm_config/cluster_property_set[@id='cib-
> > > bootstrap-options'] complete=true
> > > Jan 25 17:56:01 gopher12 pacemaker-controld  [116040]
> > > (controld_execute_fence_action)   notice: Requesting fencing
> (off)
> > > targeting node gopher11 | action=11 timeout=60
> > >
> > >
> > > _______________________________________________
> > > Manage your subscription:
> > > 
> https://urldefense.us/v3/__https:/lists.clusterlabs.org/mailman/listinfo/users
>  
> > >
> > > ClusterLabs home:
> > > https://urldefense.us/v3/__https:/www.clusterlabs.org/ 
> > >
> > --
> > Ken Gaillot <kgail...@redhat.com>
> >
> --
> Ken Gaillot <kgail...@redhat.com>
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot <kgail...@redhat.com>

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] controlling cluster behavior on startup

Reply via email to