Re: [DRBD-user] drbdmanage quorum control
Hi, While I understand the risks associated with forcing a single node online by using the re-elect options, I am currently documenting a pre-prod cluster and have to document destructive testing and recovery procedures. The situation I am trying to validate would be one where we have a 3 node cluster which spans datacenters, and we have ended up in a position where one datacenter is off the air which has in turn taken out 2 nodes of the 3 node cluster, the remaining node has for what ever reason crashed / restarted and we now need to get this node online (post cleanup tasks will be captured as part of the docs) I know that I can simply copy the content of /var/lib/drbd.d/ to /etc/drbd.d/ do a quick rename and then use drbdadm to bring the resources online, but since I am provisioning all my drbd resources via drbdmanage I would like to be able to force this service online, I have tried the drbdmanage reelect (and force-win) options but am still unable to connect to the drbdmange process (This said I am able to see all drbd resources using drbdadm status) [root@node1 ~]# drbdmanage reelect --force-win Operation completed successfully unknown [root@node1 ~]# drbdadm status .drbdctrl role:Primary volume:0 disk:UpToDate volume:1 disk:UpToDate node2.domain.name connection:Connecting node3.domain.name connection:Connecting resource-sda role:Secondary disk:UpToDate node2.domain.name connection:Connecting [root@lpisscl0001 ~]# drbdmanage ping pong [root@node1 ~]# drbdmanage v ERROR:dbus.proxies:Introspect error on :1.30:/interface: dbus.exceptions.DBusException: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. Waiting for server: ... Error: Startup not successful (no quorum? not *both* nodes up in a 2 node cluster?) No resources defined So if I am correct the (completely unsupported / do so at your own risk) process to force access to the drbdmanaged resources in the event of loss of quorum for the drbdmanaged process would be surviving node: node1 [root@node1 ~]# drbdmanage reelect --force-win recover node2 / node3 from DR procedure / backups post reintroduction of additional nodes: restart drbdmanaged process on node1 / reboot node1 Thanks Jay On 3 October 2017 at 16:08, Jason Fitzpatrick wrote: > Thanks I will try that now > > On 3 Oct 2017 12:05, "Yannis Milios" wrote: >> >> I think you have to use 'drbdmanage reelect' command to reelect a new >> leader first. >> >> man drbdmanage-reelect >> >> Yannis >> >> >> >> On Mon, Oct 2, 2017 at 2:12 PM, Jason Fitzpatrick >> wrote: >>> >>> Hi all >>> >>> I am trying to get my head around the quorum-control features within >>> drbdmanage, >>> >>> I have deliberately crashed my cluster, and spun up one node, and as >>> expected I am unable to get drbdmanage to start due to the lack of >>> quorum,, >>> >>> I was under the impression that I should have been able to override >>> the quorum state and get the drbdmanaged process online using DBUS / >>> manually calling the service, but am drawing a blank.. >>> >>> for the sake of this example it is a 2 node cluster node1 is online >>> and node2 is still powered off, >>> >>> [root@node1]# drbdmanage quorum-control --override ignore node2 >>> Modifying quorum state of node 'node2': >>> Waiting for server: ... >>> Error: Startup not successful (no quorum? not *both* nodes up in a 2 >>> node cluster?) >>> Error: Startup not successful (no quorum? not *both* nodes up in a 2 >>> node cluster?) >>> >>> Any advice? >>> >>> Thanks >>> >>> Jay >>> >>> -- >>> >>> "The only difference between saints and sinners is that every saint >>> has a past while every sinner has a future. " >>> — Oscar Wilde >>> ___ >>> drbd-user mailing list >>> drbd-user@lists.linbit.com >>> http://lists.linbit.com/mailman/listinfo/drbd-user >> >> > -- "The only difference between saints and sinners is that every saint has a past while every sinner has a future. " — Oscar Wilde ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbdmanage quorum control
Thanks I will try that now On 3 Oct 2017 12:05, "Yannis Milios" wrote: > I think you have to use 'drbdmanage reelect' command to reelect a new > leader first. > > man drbdmanage-reelect > > Yannis > > > > On Mon, Oct 2, 2017 at 2:12 PM, Jason Fitzpatrick < > jayfitzpatr...@gmail.com> wrote: > >> Hi all >> >> I am trying to get my head around the quorum-control features within >> drbdmanage, >> >> I have deliberately crashed my cluster, and spun up one node, and as >> expected I am unable to get drbdmanage to start due to the lack of >> quorum,, >> >> I was under the impression that I should have been able to override >> the quorum state and get the drbdmanaged process online using DBUS / >> manually calling the service, but am drawing a blank.. >> >> for the sake of this example it is a 2 node cluster node1 is online >> and node2 is still powered off, >> >> [root@node1]# drbdmanage quorum-control --override ignore node2 >> Modifying quorum state of node 'node2': >> Waiting for server: ... >> Error: Startup not successful (no quorum? not *both* nodes up in a 2 >> node cluster?) >> Error: Startup not successful (no quorum? not *both* nodes up in a 2 >> node cluster?) >> >> Any advice? >> >> Thanks >> >> Jay >> >> -- >> >> "The only difference between saints and sinners is that every saint >> has a past while every sinner has a future. " >> — Oscar Wilde >> ___ >> drbd-user mailing list >> drbd-user@lists.linbit.com >> http://lists.linbit.com/mailman/listinfo/drbd-user >> > > ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbdmanage quorum control
Thanks for clarifying this ... Regards, Yannis On Tue, Oct 3, 2017 at 12:30 PM, Roland Kammerer wrote: > On Tue, Oct 03, 2017 at 12:05:50PM +0100, Yannis Milios wrote: > > I think you have to use 'drbdmanage reelect' command to reelect a new > > leader first. > > > > man drbdmanage-reelect > > In general that is a bad idea, and I regret that I exposed it as a > subcommand and did not hide it behind a > "--no-you-dont-want-that-unless-you-are-rck" where it then sill asks you > to prove the Riemann hypothesis before continuing... > > > On Mon, Oct 2, 2017 at 2:12 PM, Jason Fitzpatrick < > jayfitzpatr...@gmail.com> > > wrote: > > > > > Hi all > > > > > > I am trying to get my head around the quorum-control features within > > > drbdmanage, > > > > > > I have deliberately crashed my cluster, and spun up one node, and as > > > expected I am unable to get drbdmanage to start due to the lack of > > > quorum,, > > > > > > I was under the impression that I should have been able to override > > > the quorum state and get the drbdmanaged process online using DBUS / > > > manually calling the service, but am drawing a blank.. > > > > > > for the sake of this example it is a 2 node cluster node1 is online > > > and node2 is still powered off, > > > > > > [root@node1]# drbdmanage quorum-control --override ignore node2 > > > Modifying quorum state of node 'node2': > > > Waiting for server: ... > > > Error: Startup not successful (no quorum? not *both* nodes up in a 2 > > > node cluster?) > > > Error: Startup not successful (no quorum? not *both* nodes up in a 2 > > > node cluster?) > > > > > > Any advice? > > Bring back the second node. In two node clusters that is the only clean > way to bring back the cluster. If you want quorum, get >=3 nodes. > Period. In two node clusters both have to be up. "reelect" is a last > resort command for the absolute worst case to bring up a 2 node cluster > where only one node survived and the other one is gone beyond repair. > "reelect" with a forced win alters internal state to make that possible. > It does not revert that internal state if, for whatever reason the > second node then shows up again. You would have to restart the "reelect" > node to get it then in a sane internal state again. > > tl;tr: If you want quorum: >=3 nodes. Don't use "reelect" to force wins. > > Regards, rck > ___ > drbd-user mailing list > drbd-user@lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbdmanage quorum control
On Tue, Oct 03, 2017 at 12:05:50PM +0100, Yannis Milios wrote: > I think you have to use 'drbdmanage reelect' command to reelect a new > leader first. > > man drbdmanage-reelect In general that is a bad idea, and I regret that I exposed it as a subcommand and did not hide it behind a "--no-you-dont-want-that-unless-you-are-rck" where it then sill asks you to prove the Riemann hypothesis before continuing... > On Mon, Oct 2, 2017 at 2:12 PM, Jason Fitzpatrick > wrote: > > > Hi all > > > > I am trying to get my head around the quorum-control features within > > drbdmanage, > > > > I have deliberately crashed my cluster, and spun up one node, and as > > expected I am unable to get drbdmanage to start due to the lack of > > quorum,, > > > > I was under the impression that I should have been able to override > > the quorum state and get the drbdmanaged process online using DBUS / > > manually calling the service, but am drawing a blank.. > > > > for the sake of this example it is a 2 node cluster node1 is online > > and node2 is still powered off, > > > > [root@node1]# drbdmanage quorum-control --override ignore node2 > > Modifying quorum state of node 'node2': > > Waiting for server: ... > > Error: Startup not successful (no quorum? not *both* nodes up in a 2 > > node cluster?) > > Error: Startup not successful (no quorum? not *both* nodes up in a 2 > > node cluster?) > > > > Any advice? Bring back the second node. In two node clusters that is the only clean way to bring back the cluster. If you want quorum, get >=3 nodes. Period. In two node clusters both have to be up. "reelect" is a last resort command for the absolute worst case to bring up a 2 node cluster where only one node survived and the other one is gone beyond repair. "reelect" with a forced win alters internal state to make that possible. It does not revert that internal state if, for whatever reason the second node then shows up again. You would have to restart the "reelect" node to get it then in a sane internal state again. tl;tr: If you want quorum: >=3 nodes. Don't use "reelect" to force wins. Regards, rck ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbdmanage quorum control
I think you have to use 'drbdmanage reelect' command to reelect a new leader first. man drbdmanage-reelect Yannis On Mon, Oct 2, 2017 at 2:12 PM, Jason Fitzpatrick wrote: > Hi all > > I am trying to get my head around the quorum-control features within > drbdmanage, > > I have deliberately crashed my cluster, and spun up one node, and as > expected I am unable to get drbdmanage to start due to the lack of > quorum,, > > I was under the impression that I should have been able to override > the quorum state and get the drbdmanaged process online using DBUS / > manually calling the service, but am drawing a blank.. > > for the sake of this example it is a 2 node cluster node1 is online > and node2 is still powered off, > > [root@node1]# drbdmanage quorum-control --override ignore node2 > Modifying quorum state of node 'node2': > Waiting for server: ... > Error: Startup not successful (no quorum? not *both* nodes up in a 2 > node cluster?) > Error: Startup not successful (no quorum? not *both* nodes up in a 2 > node cluster?) > > Any advice? > > Thanks > > Jay > > -- > > "The only difference between saints and sinners is that every saint > has a past while every sinner has a future. " > — Oscar Wilde > ___ > drbd-user mailing list > drbd-user@lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user