Re: [Linux-HA] Server becomes unresponsive after node failure
Dejan Muhamedagic wrote: > I guess that in some shops you'd need to > clone yourself or sth else, otherwise you just wouldn't scale > with demand. Yeah, I keep telling my boss that. >> ... split-brain > So, did you have stonith in place then? ;-) I've users instead, they come and tell me the Internet is broken when that sort of thing happens. > The cost of decent fencing hardware is nowadays really small. > And the probability of the power supplies going bad is much > higher than that of PDU/PSU. Interestingly enough, out of approx. 300 computer-years here, the kit from our server vendor had 3 PSU failures recently. On 3 identical machines bought some 4 years ago. (The other failure was a sata backplane.) With 4 hardware failures on 60 servers in 5 years, it's hard to justify decent fencing hardware even if it were free. And net-connected PDUs that can power a full-height server cabinet are actually far from it. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Server becomes unresponsive after node failure
On Wed, Mar 09, 2011 at 11:51:20AM -0600, Dimitri Maziuk wrote: > Dejan Muhamedagic wrote: > > On Tue, Mar 08, 2011 at 02:27:52PM -0600, Dimitri Maziuk wrote: > > >> Well, realistically, if the link is a foot of x/over cable and gremlins > >> have not been pulling on it, and the NICs aren't falling out of their > >> slots, and are half-decent quality hardware, and the drivers aren't > >> alpha prototype code, and so on, the chances of it being the "link down" > >> case should be fairly low. > > > > LOL. BTW, the gremlins I saw doing that were wearing company > > badges and pulling wrong cables. Realistically, never > > underestimate human factor :) > > That's why we put locks on our server room doors. So that I am the only > gremlin there. Well, that's good for your cluster too. But it places a bit of an extra burden on you. I guess that in some shops you'd need to clone yourself or sth else, otherwise you just wouldn't scale with demand. > (Last time I saw split-brain was when I myself pulled on the x/over > cable. Really. If you have an rj45 connector with the little tab broken > off, throw it out and get a new one now. Trust me, it's much cheaper > than the alternative.) So, did you have stonith in place then? ;-) > Seriously, though, you have to weigh the cost of ipmi daughterboards or > net-connected power strips vs the likelyhood of losing that cross-over > link vs the likelyhood of the power strip itself going titsup and taking > down your entire cluster. *Then* say "go get a real stonith device". The cost of decent fencing hardware is nowadays really small. And the probability of the power supplies going bad is much higher than that of PDU/PSU. And so on. :) Cheers, Dejan > Dima > -- > Dimitri Maziuk > Programmer/sysadmin > BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Server becomes unresponsive after node failure
Dejan Muhamedagic wrote: > On Tue, Mar 08, 2011 at 02:27:52PM -0600, Dimitri Maziuk wrote: >> Well, realistically, if the link is a foot of x/over cable and gremlins >> have not been pulling on it, and the NICs aren't falling out of their >> slots, and are half-decent quality hardware, and the drivers aren't >> alpha prototype code, and so on, the chances of it being the "link down" >> case should be fairly low. > > LOL. BTW, the gremlins I saw doing that were wearing company > badges and pulling wrong cables. Realistically, never > underestimate human factor :) That's why we put locks on our server room doors. So that I am the only gremlin there. (Last time I saw split-brain was when I myself pulled on the x/over cable. Really. If you have an rj45 connector with the little tab broken off, throw it out and get a new one now. Trust me, it's much cheaper than the alternative.) Seriously, though, you have to weigh the cost of ipmi daughterboards or net-connected power strips vs the likelyhood of losing that cross-over link vs the likelyhood of the power strip itself going titsup and taking down your entire cluster. *Then* say "go get a real stonith device". Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Server becomes unresponsive after node failure
On Tue, Mar 08, 2011 at 02:27:52PM -0600, Dimitri Maziuk wrote: > Lars Ellenberg wrote: > > > > Oh, that's easy. external/ssh pings the victim, and if it does not > > answer, which will be the case for a down node as well as a down link, > > stonith is considered to have been successful ;-) > > > > In the "node down" case, this will allow the cluster to proceed, > > and all is well. > > > > But in the "link down" case, this will allow the cluster to proceed, > > even though the victim will continue to run it's services, causing > > cluster split brain and data corruption. > > Well, realistically, if the link is a foot of x/over cable and gremlins > have not been pulling on it, and the NICs aren't falling out of their > slots, and are half-decent quality hardware, and the drivers aren't > alpha prototype code, and so on, the chances of it being the "link down" > case should be fairly low. LOL. BTW, the gremlins I saw doing that were wearing company badges and pulling wrong cables. Realistically, never underestimate human factor :) Dejan > Dima > -- > Dimitri Maziuk > Programmer/sysadmin > BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Server becomes unresponsive after node failure
Lars Ellenberg wrote: > > Oh, that's easy. external/ssh pings the victim, and if it does not > answer, which will be the case for a down node as well as a down link, > stonith is considered to have been successful ;-) > > In the "node down" case, this will allow the cluster to proceed, > and all is well. > > But in the "link down" case, this will allow the cluster to proceed, > even though the victim will continue to run it's services, causing > cluster split brain and data corruption. Well, realistically, if the link is a foot of x/over cable and gremlins have not been pulling on it, and the NICs aren't falling out of their slots, and are half-decent quality hardware, and the drivers aren't alpha prototype code, and so on, the chances of it being the "link down" case should be fairly low. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Server becomes unresponsive after node failure
On Tue, Mar 08, 2011 at 05:43:17PM +0100, Dejan Muhamedagic wrote: > Hi, > > On Tue, Mar 08, 2011 at 05:32:44PM +0100, Sascha Hagedorn wrote: > > Hi Dejan, > > > > thank you for your answer. I added an external/ssh stonith resource > > to test this and it resolved the problem. It wasn't clear to me that > > the stonith resource does more than shooting the other node. > > Apparently some cluster parameters are being set too, so the system > > stays clean. During the test my understanding was when I cut the > > power of one node I don't need a stonith device to shoot it. > > Hmm, I wonder how external/ssh could've solved this particular > issue, since if you pull the plug it will never be able to fence > that node. Oh, that's easy. external/ssh pings the victim, and if it does not answer, which will be the case for a down node as well as a down link, stonith is considered to have been successful ;-) In the "node down" case, this will allow the cluster to proceed, and all is well. But in the "link down" case, this will allow the cluster to proceed, even though the victim will continue to run it's services, causing cluster split brain and data corruption. That's why: > You really need a usable stonith device. external/ssh > is for testing only. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Server becomes unresponsive after node failure
Hi, On Tue, Mar 08, 2011 at 05:32:44PM +0100, Sascha Hagedorn wrote: > Hi Dejan, > > thank you for your answer. I added an external/ssh stonith resource to test > this and it resolved the problem. It wasn't clear to me that the stonith > resource does more than shooting the other node. Apparently some cluster > parameters are being set too, so the system stays clean. During the test my > understanding was when I cut the power of one node I don't need a stonith > device to shoot it. Hmm, I wonder how external/ssh could've solved this particular issue, since if you pull the plug it will never be able to fence that node. You really need a usable stonith device. external/ssh is for testing only. Thanks, Dejan > > Thanks again, > Sascha > > -Ursprüngliche Nachricht- > Von: linux-ha-boun...@lists.linux-ha.org > [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Dejan Muhamedagic > Gesendet: Montag, 7. März 2011 16:43 > An: General Linux-HA mailing list > Betreff: Re: [Linux-HA] Server becomes unresponsive after node failure > > Hi, > > On Mon, Mar 07, 2011 at 10:55:01AM +0100, Sascha Hagedorn wrote: > > Hello everyone, > > > > I am evaluating a two node cluster setup and I am running into some > > problems. The cluster runs a dual master DRBD disk with a OCFS2 filesystem. > > Here are the used software versions: > > > > > > - SLES11 + HAE Extension > > SLE11 is not supported anymore, you'd need to upgrade to SLE11SP1. > > > - DRBD 8.3.7 > > > > - OCFS2 1.4.2 > > > > - libdlm 3.00.01 > > > > - cluster-glue 1.0.5 > > > > - Pacemaker 1.1.2 > > > > - OpenAIS 1.1.2 > > > > The problem occurs when the second node is being powered off instantly by > > pulling the power cable. Shortly after that the load average on the > > surviving system goes up at a very high rate, with no CPU utilization until > > the server becomes unresponsive. Processes I see in the top list very > > frequently are cib, dlm_controld, corosync and ha_logd. Access to the DRBD > > partition is not possible, although the crm_mon shows it is being mounted > > and all services are running. An "ls" on the DRBD OCFS2 partition results > > in a hanging prompt (So does "df" or any other command accessing the > > partition). > > You created a split-brain condition, but have no stonith > resources (and stonith is disabled). That won't work. > > Thanks, > > Dejan > > > > > crm_mon after the power is cut on cluster-node2: > > > > > > Last updated: Mon Mar 7 10:32:10 2011 > > Stack: openais > > Current DC: cluster-node1 - partition WITHOUT quorum > > Version: 1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5 > > 2 Nodes configured, 2 expected votes > > 4 Resources configured. > > > > > > Online: [ cluster-node1 ] > > OFFLINE: [ cluster-node2 ] > > > > Master/Slave Set: ms_drbd > > Masters: [ cluster-node1 ] > > Stopped: [ p_drbd:1 ] > > Clone Set: cl_dlm > > Started: [ cluster-node1 ] > > Stopped: [ p_dlm:1 ] > > Clone Set: cl_o2cb > > Started: [ cluster-node1 ] > > Stopped: [ p_o2cb:1 ] > > Clone Set: cl_fs > > Started: [ cluster-node1 ] > > Stopped: [ p_fs:1 ] > > > > The configuration is as follows: > > > > node cluster-node1 > > node cluster-node2 > > primitive p_dlm ocf:pacemaker:controld \ > > op monitor interval="120s" > > primitive p_drbd ocf:linbit:drbd \ > > params drbd_resource="r0" \ > > operations $id="p_drbd-operations" \ > > op monitor interval="20" role="Master" timeout="20" \ > > op monitor interval="30" role="Slave" timeout="20" > > primitive p_fs ocf:heartbeat:Filesystem \ > > params device="/dev/drbd0" directory="/data" fstype="ocfs2" \ > > op monitor interval="120s" > > primitive p_o2cb ocf:ocfs2:o2cb \ > > op monitor interval="120s" > > ms ms_drbd p_drbd \ > > meta resource-stickines="100" notify="true" master-max="2" > > interleave="true" > > clone cl_dlm p_dlm \ > > meta globally-unique="false" interleave="true" > > clone cl_fs p_fs \ > > meta interleave="true&
Re: [Linux-HA] Server becomes unresponsive after node failure
Hi Dejan, thank you for your answer. I added an external/ssh stonith resource to test this and it resolved the problem. It wasn't clear to me that the stonith resource does more than shooting the other node. Apparently some cluster parameters are being set too, so the system stays clean. During the test my understanding was when I cut the power of one node I don't need a stonith device to shoot it. Thanks again, Sascha -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Dejan Muhamedagic Gesendet: Montag, 7. März 2011 16:43 An: General Linux-HA mailing list Betreff: Re: [Linux-HA] Server becomes unresponsive after node failure Hi, On Mon, Mar 07, 2011 at 10:55:01AM +0100, Sascha Hagedorn wrote: > Hello everyone, > > I am evaluating a two node cluster setup and I am running into some problems. > The cluster runs a dual master DRBD disk with a OCFS2 filesystem. Here are > the used software versions: > > > - SLES11 + HAE Extension SLE11 is not supported anymore, you'd need to upgrade to SLE11SP1. > - DRBD 8.3.7 > > - OCFS2 1.4.2 > > - libdlm 3.00.01 > > - cluster-glue 1.0.5 > > - Pacemaker 1.1.2 > > - OpenAIS 1.1.2 > > The problem occurs when the second node is being powered off instantly by > pulling the power cable. Shortly after that the load average on the > surviving system goes up at a very high rate, with no CPU utilization until > the server becomes unresponsive. Processes I see in the top list very > frequently are cib, dlm_controld, corosync and ha_logd. Access to the DRBD > partition is not possible, although the crm_mon shows it is being mounted and > all services are running. An "ls" on the DRBD OCFS2 partition results in a > hanging prompt (So does "df" or any other command accessing the partition). You created a split-brain condition, but have no stonith resources (and stonith is disabled). That won't work. Thanks, Dejan > > crm_mon after the power is cut on cluster-node2: > > > Last updated: Mon Mar 7 10:32:10 2011 > Stack: openais > Current DC: cluster-node1 - partition WITHOUT quorum > Version: 1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5 > 2 Nodes configured, 2 expected votes > 4 Resources configured. > > > Online: [ cluster-node1 ] > OFFLINE: [ cluster-node2 ] > > Master/Slave Set: ms_drbd > Masters: [ cluster-node1 ] > Stopped: [ p_drbd:1 ] > Clone Set: cl_dlm > Started: [ cluster-node1 ] > Stopped: [ p_dlm:1 ] > Clone Set: cl_o2cb > Started: [ cluster-node1 ] > Stopped: [ p_o2cb:1 ] > Clone Set: cl_fs > Started: [ cluster-node1 ] > Stopped: [ p_fs:1 ] > > The configuration is as follows: > > node cluster-node1 > node cluster-node2 > primitive p_dlm ocf:pacemaker:controld \ > op monitor interval="120s" > primitive p_drbd ocf:linbit:drbd \ > params drbd_resource="r0" \ > operations $id="p_drbd-operations" \ > op monitor interval="20" role="Master" timeout="20" \ > op monitor interval="30" role="Slave" timeout="20" > primitive p_fs ocf:heartbeat:Filesystem \ > params device="/dev/drbd0" directory="/data" fstype="ocfs2" \ > op monitor interval="120s" > primitive p_o2cb ocf:ocfs2:o2cb \ > op monitor interval="120s" > ms ms_drbd p_drbd \ > meta resource-stickines="100" notify="true" master-max="2" > interleave="true" > clone cl_dlm p_dlm \ > meta globally-unique="false" interleave="true" > clone cl_fs p_fs \ > meta interleave="true" ordered="true" > clone cl_o2cb p_o2cb \ > meta globally-unique="false" interleave="true" > colocation co_dlm-drbd inf: cl_dlm ms_drbd:Master > colocation co_fs-o2cb inf: cl_fs cl_o2cb > colocation co_o2cb-dlm inf: cl_o2cb cl_dlm > order o_dlm-o2cb 0: cl_dlm cl_o2cb > order o_drbd-dlm 0: ms_drbd:promote cl_dlm > order o_o2cb-fs 0: cl_o2cb cl_fs > property $id="cib-bootstrap-options" \ > dc-version="1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" > > Here is a snippet from /var/log/messages (power cut at 10:32:02): > > Mar 7 10:32:03 cluster-node1 ker
Re: [Linux-HA] Server becomes unresponsive after node failure
Hi, On Mon, Mar 07, 2011 at 10:55:01AM +0100, Sascha Hagedorn wrote: > Hello everyone, > > I am evaluating a two node cluster setup and I am running into some problems. > The cluster runs a dual master DRBD disk with a OCFS2 filesystem. Here are > the used software versions: > > > - SLES11 + HAE Extension SLE11 is not supported anymore, you'd need to upgrade to SLE11SP1. > - DRBD 8.3.7 > > - OCFS2 1.4.2 > > - libdlm 3.00.01 > > - cluster-glue 1.0.5 > > - Pacemaker 1.1.2 > > - OpenAIS 1.1.2 > > The problem occurs when the second node is being powered off instantly by > pulling the power cable. Shortly after that the load average on the > surviving system goes up at a very high rate, with no CPU utilization until > the server becomes unresponsive. Processes I see in the top list very > frequently are cib, dlm_controld, corosync and ha_logd. Access to the DRBD > partition is not possible, although the crm_mon shows it is being mounted and > all services are running. An "ls" on the DRBD OCFS2 partition results in a > hanging prompt (So does "df" or any other command accessing the partition). You created a split-brain condition, but have no stonith resources (and stonith is disabled). That won't work. Thanks, Dejan > > crm_mon after the power is cut on cluster-node2: > > > Last updated: Mon Mar 7 10:32:10 2011 > Stack: openais > Current DC: cluster-node1 - partition WITHOUT quorum > Version: 1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5 > 2 Nodes configured, 2 expected votes > 4 Resources configured. > > > Online: [ cluster-node1 ] > OFFLINE: [ cluster-node2 ] > > Master/Slave Set: ms_drbd > Masters: [ cluster-node1 ] > Stopped: [ p_drbd:1 ] > Clone Set: cl_dlm > Started: [ cluster-node1 ] > Stopped: [ p_dlm:1 ] > Clone Set: cl_o2cb > Started: [ cluster-node1 ] > Stopped: [ p_o2cb:1 ] > Clone Set: cl_fs > Started: [ cluster-node1 ] > Stopped: [ p_fs:1 ] > > The configuration is as follows: > > node cluster-node1 > node cluster-node2 > primitive p_dlm ocf:pacemaker:controld \ > op monitor interval="120s" > primitive p_drbd ocf:linbit:drbd \ > params drbd_resource="r0" \ > operations $id="p_drbd-operations" \ > op monitor interval="20" role="Master" timeout="20" \ > op monitor interval="30" role="Slave" timeout="20" > primitive p_fs ocf:heartbeat:Filesystem \ > params device="/dev/drbd0" directory="/data" fstype="ocfs2" \ > op monitor interval="120s" > primitive p_o2cb ocf:ocfs2:o2cb \ > op monitor interval="120s" > ms ms_drbd p_drbd \ > meta resource-stickines="100" notify="true" master-max="2" > interleave="true" > clone cl_dlm p_dlm \ > meta globally-unique="false" interleave="true" > clone cl_fs p_fs \ > meta interleave="true" ordered="true" > clone cl_o2cb p_o2cb \ > meta globally-unique="false" interleave="true" > colocation co_dlm-drbd inf: cl_dlm ms_drbd:Master > colocation co_fs-o2cb inf: cl_fs cl_o2cb > colocation co_o2cb-dlm inf: cl_o2cb cl_dlm > order o_dlm-o2cb 0: cl_dlm cl_o2cb > order o_drbd-dlm 0: ms_drbd:promote cl_dlm > order o_o2cb-fs 0: cl_o2cb cl_fs > property $id="cib-bootstrap-options" \ > dc-version="1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" > > Here is a snippet from /var/log/messages (power cut at 10:32:02): > > Mar 7 10:32:03 cluster-node1 kernel: [ 4714.838629] r8169: eth0: link down > Mar 7 10:32:06 cluster-node1 corosync[4300]: [TOTEM ] A processor failed, > forming new configuration. > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748011] block drbd0: PingAck did > not arrive in time. > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748020] block drbd0: peer( > Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> > DUnknown ) > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748031] block drbd0: asender > terminated > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748035] block drbd0: short read > expecting header on sock: r=-512 > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748037] block drbd0: Terminating > asender thread > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748068] block drbd0: Creating > new current UUID > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763424] block drbd0: Connection > closed > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763429] block drbd0: conn( > NetworkFailure -> Unconnected ) > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763434] block drbd0: receiver > terminated > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763436] block drbd0: Restarting > receiver thread > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763439] block drbd0: receiver > (re)started > Mar 7 10:32:06 cluster-node1 kernel: [ 4