[Pacemaker] Trouble getting node to re-join two node cluster (OCFS2/DRBD Primary/Primary)

2011-09-15 Thread Mike Reid
Hello all,

We have a two-node cluster still in development that has been running fine
for weeks (little to no traffic). I made some updates to our CIB recently,
and everything seemed just fine.

Yesterday I attempted to untar ~1.5GB to the OCFS2/DRBD volume, and once it
was complete one of the nodes had become completely disconnected and I
haven't been able to reconnect since.

DRBD is working fine, everything is UpToDate and I can get both nodes in
Primary/Primary, but when it comes down to starting OCFS2 and mounting the
volume, I'm left with:

> resFS:0_start_0 (node=node1, call=21, rc=1, status=complete): unknown error

I am using "pcmk" as the cluster_stack, and letting Pacemaker control
everything...

The last time this happened the only way I was able to resolve it was to
reformat the device (via mkfs.ocfs2 -F). I don't think I should have to do
this, underlying blocks seem fine, and one of the nodes is running just
fine. The (currently) unmounted node is staying in sync as far as DRBD is
concerned.

Here's some detail that hopefully will help, please let me know if there's
anything else I can provide to help know the best way to get this node back
"online":


Ubuntu 10.10 / Kernel 2.6.35

Pacemaker 1.0.9.1
Corosync 1.2.1
Cluster Agents 1.0.3 (Heartbeat)
Cluster Glue 1.0.6
OpenAIS 1.1.2

DRBD 8.3.10
OCFS2 1.5.0

cat /sys/fs/ocfs2/cluster_stack = pcmk

node1: mounted.ocfs2 -d

DeviceFS UUID  Label
/dev/sda3 ocfs2  fe4273e1-f866-4541-bbcf-66c5dfd496d6

node2: mounted.ocfs2 -d

DeviceFS UUID  Label
/dev/sda3 ocfs2  d6f7cc6d-21d1-46d3-9792-bc650736a5ef
/dev/drbd0ocfs2  d6f7cc6d-21d1-46d3-9792-bc650736a5ef

* NOTES:
- Both nodes are identical, in fact one node is a direct mirror (hdd clone)
- I have attached the CIB (crm configure edit contents) and mount trace





crm_configure.txt
Description: Binary data


mount_trace.txt
Description: Binary data
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] How to prevent locked I/O using Pacemaker with Primary/Primary DRBD/OCFS2 (Ubuntu 10.10)

2011-04-07 Thread Mike Reid
Lars,

Interesting, I will definitely continue in that direction then. Perhaps I
misunderstood the requirements of STONITH. I understand it to be a form of
³remote reboot/shut down² of sorts, and being that the box was already ³shut
down², I assumed at this stage in my testing that it could not be related to
STONITH since the box was confirmed to be down. Perhaps Pacemaker is just
awaiting that confirmation as you suggest, so thank you, I will see if that
is indeed the case. I¹ve seen quite a few stonith operation options
available, is there any one of them that is better suited for a simple
two-node cluster (OCFS2)?


> Message: 1
> Date: Thu, 7 Apr 2011 02:50:09 +0200
> From: Lars Ellenberg 
> To: pacemaker@oss.clusterlabs.org
> Subject: Re: [Pacemaker] How to prevent locked I/O using Pacemaker
> with Primary/Primary DRBD/OCFS2 (Ubuntu 10.10)
> Message-ID: <20110407005009.GF3726@barkeeper1-xen.linbit>
> Content-Type: text/plain; charset=iso-8859-1
> 
> On Wed, Apr 06, 2011 at 10:26:24AM -0600, Reid, Mike wrote:
>> Lars,
>> 
>> Thank you for your comments. I did confirm I was running 8.3.8.1, and I have
>> even upgraded to 8.3.10 but am still experiencing the same I/O lock issue. I
>> definitely agree with you, DRBD is behaving exactly as instructed, being
>> properly fenced, etc.
>> 
>> I am quite new to DRBD (and OCFS2), learning a lot as I go. To your
>> question regarding copy/paste, yes, the configuration used was
>> culminated from a series of different tutorials, plus personal trial
>> and error related to this project. I have tried many variations of the
>> DRBD config (including resource-and-stonith)
> 
>> but have not actually set up a functioning STONITH yet,
> 
> And that's why your ocfs2 does not unblock.
> It waits for confirmation of a STONITH operation.
> 
>> hence the
>> "resource-only". The  Linbit
>> docs have been an amazing resource.
>> 
>> Yes, I realize that a Secondary-node is not indicative of it's
>> data/synch state. The options I am testing here were referenced from
>> this page:
>> 
>> 
>> 
>> http://www.drbd.org/users-guide/s-ocfs2-create-resource.html
>> http://www.drbd.org/users-guide/s-configure-split-brain-behavior.html#s-autom
>> atic-split-brain-recovery-configuration
>> 
>> 
>> 
>> When you say "You do configure automatic data loss here", are you
>> suggesting that I am instructing DRBD survivor to perform a full
>> re-synch to it's peer?
> 
> Nothing to do with full sync. Should usually be a bitmap based resync.
> 
> But it may be a sync in an "unexpected" direction.
> 
>> If so, that would make sense since I believe
>> this behavior was something I experienced prior to getting fencing
>> fully established. In my hard-boot testing, I did once notice the
>> "victim" was completely resynching, which sounds related to
>> "after-sb-1pri discard-secondary".
>> 
>> DRBD aside, have you used OCFS2? I'm failing to realize why if DRBD is
>> fencing it's peer that OCFS2 remains in a locked-state, unable to run
>> standalone? To me, this issue does not seem related to DRBD or Pacemaker, but
>> rather a lower-level requirement of OCFS2 (DLM?), etc.
>> 
>> To date, the ONLY way I can restore I/O to the remaining node is to bring the
>> other node back online, which unfortunately won't work in our Production
>> environment. On a separate ML, someone made a suggestion that "qdisk" might
>> be required to make this work, and while I have tried "qdisk", my high-level
>> research leads me to believe that is a legacy approach, not an option with
>> Pacemaker.  Is that correct?
> 
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: 
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 
> 
> -- 
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
> 
> DRBD? and LINBIT? are registered trademarks of LINBIT, Austria.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] How to prevent locked I/O using Pacemaker with Primary/Primary DRBD/OCFS2 (Ubuntu 10.10)

2011-04-04 Thread Mike Reid
All,

I am running a two-node web cluster on OCFS2 (v1.5.0) via DRBD
Primary/Primary (v8.3.8) and Pacemaker. Everything  seems to be working
great, except during testing of hard-boot scenarios.

Whenever I hard-boot one of the nodes, the other node is successfully fenced
and marked ³Outdated²

* 

However, this locks up I/O on the still active node and prevents any
operations within the cluster :( I have even forced DRBD into StandAlone
mode while in this state, but that does not resolve the I/O lock
eitherdoes anyone know if this is possible using OCFS2 (maintaining an
active cluster in Primary/Unknown once the other node has a failure? E.g. Be
it forced, controlled, etc)

I have been focusing on DRBD config, but I am starting to wonder if perhaps
it¹s something with my Pacemaker or OCFS2 setup that is forcing this I/O
lock during a failure.  Any thoughts?

-
crm_mon (crm_mon 1.0.9 for OpenAIS and Heartbeat):

> 
> Last updated: Mon Apr  4 12:57:47 2011
> Stack: openais
> Current DC: ubu10a - partition with quorum
> Version: 1.0.9-unknown
> 2 Nodes configured, 2 expected votes
> 4 Resources configured.
> 
> 
> Online: [ ubu10a ubu10b ]
> 
>  Master/Slave Set: msDRBD
>  Masters: [ ubu10a ubu10b ]
>  Clone Set: cloneDLM
>  Started: [ ubu10a ubu10b ]
>  Clone Set: cloneO2CB
>  Started: [ ubu10a ubu10b ]
>  Clone Set: cloneFS
>  Started: [ ubu10a ubu10b ]

-
DRBD (v8.3.8):
> 
> version: 8.3.8 (api:88/proto:86-94)
> 0:repdata  Connected  Primary/Primary  UpToDate/UpToDate  C  /dataocfs2

-
DRBD Conf:
> 
> global {
>   usage-count no;
> }
> common {
>   syncer { rate 10M; }
> }
> resource repdata {
>   protocol C;
> 
>   meta-disk internal;
>   device /dev/drbd0;
>   disk /dev/sda3;
> 
>   handlers {
> pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
> pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
> local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
> split-brain "/usr/lib/drbd/notify-split-brain.sh root";
> fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
> after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
>   }
>   startup {
> degr-wfc-timeout 120;   # 120 = 2 minutes.
> wfc-timeout 30;
> become-primary-on both;
>   }
>   disk {
> fencing resource-only;
>   }
>   syncer {
> rate 10M;
> al-extents 257;
>   }
>   net {
> cram-hmac-alg "sha1";
> shared-secret "XXX";
> allow-two-primaries;
> after-sb-0pri discard-zero-changes;
> after-sb-1pri discard-secondary;
> after-sb-2pri disconnect;
>   }
>   on ubu10a {
> address 192.168.0.66:7788;
>   }
>   on ubu10b {
> address 192.168.0.67:7788;
>   }
> }


-
CIB.xml
> 
> node ubu10a \
> attributes standby="off"
> node ubu10b \
> attributes standby="off"
> primitive resDLM ocf:pacemaker:controld \
> op monitor interval="120s"
> primitive resDRBD ocf:linbit:drbd \
> params drbd_resource="repdata" \
> operations $id="resDRBD-operations" \
> op monitor interval="20s" role="Master" timeout="120s" \
> op monitor interval="30s" role="Slave" timeout="120s"
> primitive resFS ocf:heartbeat:Filesystem \
> params device="/dev/drbd/by-res/repdata" directory="/data"
> fstype="ocfs2" \
> op monitor interval="120s"
> primitive resO2CB ocf:pacemaker:o2cb \
> op monitor interval="120s"
> ms msDRBD resDRBD \
> meta resource-stickines="100" notify="true" master-max="2"
> interleave="true"
> clone cloneDLM resDLM \
> meta globally-unique="false" interleave="true"
> clone cloneFS resFS \
> meta interleave="true" ordered="true"
> clone cloneO2CB resO2CB \
> meta globally-unique="false" interleave="true"
> colocation colDLMDRBD inf: cloneDLM msDRBD:Master
> colocation colFSO2CB inf: cloneFS cloneO2CB
> colocation colO2CBDLM inf: cloneO2CB cloneDLM
> order ordDLMO2CB 0: cloneDLM cloneO2CB
> order ordDRBDDLM 0: msDRBD:promote cloneDLM
> order ordO2CBFS 0: cloneO2CB cloneFS
> property $id="cib-bootstrap-options" \
> dc-version="1.0.9-unknown" \
> cluster-infrastructure="openais" \
> stonith-enabled="false" \
> no-quorum-policy="ignore" \
> expected-quorum-votes="2"
> 
> 

-








___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker