Re: [Linux-HA] Looking for a suitable Stonith Solution

2011-03-03 Thread Andrew Beekhof
On Wed, Mar 2, 2011 at 9:05 AM, Stallmann, Andreas astallm...@conet.de wrote:
 Hi Andrew,

 If suicide is no supported fencing option, why is it still included with 
 stonith?
 Left over from heartbeat v1 days I guess.
 Could also be a testing-only device like ssh.

 www.clusterlabs.org tells me, you're the Pacemaker project leader.

Yes, but the stonith devices come from cluster-glue.
So I guess Dejan or Florian are nominally in charge of those, but
they've not been changed in forever.

 Would you, by chance, know who maintains or maintained the 
 suicide-stonith-plugin? It maybe testing-only, yes. But at least, ssh is 
 working as intended.

 It's badly documented, and I didn't find a single (official) document
 on howto implement a (stable!) suicide-stonith,
 Because you can't.  Suicide is not, will not, can not be reliable.
 Yes, you're right. But under certain circumstances (1. nodes are still alive, 
 2. both redundant communication channels [networks] are down, 3. policy 
 requires no node to be up, which has no quorum) it might be a good addition 
 to a regular stonith (because if [2] happens, pacemaker/stonith will 
 probably not be able to control a network power switch etc.) Could we agree 
 on that?

Sure. But even if you have a functioning suicide plugin, Pacemaker
cannot ever make decisions that assume it worked.
Because for all it knows the other side might consider itself to be
perfectly healthy.

 If not: What's your recommended setup for (resp. against) such situations? 
 Think of split sites here!

You still need reliable fencing, if you cant provide that, there needs
to be a human in the loop.

 The whole point of stonith is to create a known node state (off) in 
 situations where you cannot be sure if your peer is alive, dead  or some 
 state in-between.
 Yes, so don't file suicide under stonith! We implemented a different 
 approach in a two node cluster: We wrote a script that checks (by means of 
 cron) the connectivity (by means of ping) to the peer (if connected, 
 everything fine) and then (if peer are not reachable) to some quorum nodes. 
 If either the peer or a majority of the quorum nodes are alive, nothing 
 happens. If quorum is lost, the node shut's itself down.

Wonderful, but the healthy side still can't do anything, because it
can't know that the bad side is down.
So what have you gained over no-quorum-policy=stop (which is the default) ?


 We did that, because drbd tended to misbehave in situations, where all 
 network connectivity was lost. We'd rather have a clean shutdown on both 
 sides, than a corrupt filesystem. I always consider this solution as 
 unelegant, mainly because it wasn't controllable via crm. Thus I hoped, I 
 could forget this solution when using pacemaker. It seems, I can not.

 If there's any interest from the community in our suicide by cron-solution, 
 tell me if and how to contribute.

 It requires a sick node to suddenly start functioning correctly - so 
 attempting to self-terminate makes some sense, relying on it to succeed does 
 not seem prudent.

 Ys! But it's not always the node, that's sick. Sometimes (even with the 
 best and most redundant network), the connectivity between the node ist the 
 problem, not a marauding pacemaker or openais! Again: Please tell me, what's 
 your solution in that case?

Again, tell me how the other side is supposed to know and what you gain?


 On the other hand, it doen't make any other sense to name a 
 no-quorum-policy suicide, if it's anything, but a suicide (if, at all, 
 one could name it assisted suicide).

 This question is still unanswered. Does no quorum-policy suicide really 
 have a meaning?

yes, for N  2, it is a faster version of stop

 Or is it as well a leftover from the times of heartbeat.

no

 Is it still functional?

yes
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Looking for a suitable Stonith Solution

2011-02-28 Thread Andrew Beekhof
On Fri, Feb 25, 2011 at 12:51 PM, Stallmann, Andreas
astallm...@conet.de wrote:
 Hi!

 I conentrate both your answers into one mail, I hope that's allright for you.

 For now, I need an interim solution, which is, as of now, stonith via 
 suicide.
 Doesn't work as suicide is not considered reliable - by definition the 
 remaining nodes have no way to verify that the fencing operation was 
 successful.
 Suspect it will still fail though, suicide isnt a supported fencing option - 
 since obviously the other nodes can't confirm it happened.

 Ok then, I know I'm a little bit provocative right now:

 If suicide is no supported fencing option, why is it still included with 
 stonith?

Left over from heartbeat v1 days I guess.
Could also be a testing-only device like ssh.

 It's badly documented, and I didn't find a single (official) document on 
 howto implement a (stable!) suicide-stonith,

Because you can't.  Suicide is not, will not, can not be reliable.
The whole point of stonith is to create a known node state (off) in
situations where you cannot be sure if your peer is alive, dead or
some state in-between.

Suicide does not achieve this in any way, shape or form.
It requires a sick node to suddenly start functioning correctly - so
attempting to self-terminate makes some sense, relying on it to
succeed does not seem prudent.

 but it's there, and thus it should be usable. If it isn't, the maintainer 
 should please (please!) remove it or supply something that's working. I do 
 know, that's quite demanding, because the maintainer will probably do the 
 development in his (or her) free time. Still...

 I do as well agree, that suicide is a very special way of keeping a cluster 
 consistent, very different from the other stonith methods. I wouldn't expect 
 it under stonith, I'd rather think...

 Yes no-quorum-policy=suicide means that all nodes in the partition will end 
 up being shot, but you still require a real stonith device so that 
 _someone_else_ can perform it.
 ...that if you set no-quorum-policy=suicide, the suicide script is executed 
 by the node itself. It should be an *extra* feature *besides* stonith. The 
 procedure should be something like:

 1) node1: Allright, I have no quorum anymore. Let's wait for a while...
 2)... a while passes
 3) node1: OK, I'm still without quorum, no contact to my peers, whatsoever. 
 I'd rather shut myself down, before I cause a mess.

 If, during (2), the other nodes find a way to shut down the node externaly 
 (if through ssh, a power switch, a virtualisation host...), that's even 
 better, because then the cluster knows, that it's still consistent. I'm 
 with you, here.

 If a split brain happens in a split site scenario, a suicide might be the 
 only way to keep up consistency,  because no one will be able to reach any 
 device on the other site... Please correct me if I'm wrong. What do you do in 
 such a case? What's your exemplary implementation of Linux-HA then?

 On the other hand, it doen't make any other sense to name a 
 no-quorum-policy suicide, if it's anything, but a suicide (if, at all, 
 one could name it assisted suicide).

 Please correct me: Do I have a utterly wrong understanding of the whole 
 process (that could be very well the case), is the implementation not 
 entirely thought through, or is the naming of certain components not as good 
 as it could be?

 I might point you to 
 http://osdir.com/ml/linux.highavailability.devel/2007-11/msg00026.html, 
 because the same thing has been discussed then, and I very much do think, 
 that Lars was right with what he wrote. Has anything changed in the concept 
 of suicide/quorum-loss/stonith since then? That's not a provocative question, 
 well, maybe it is, but it's not meant to be.

 In addition: Something that's missing from the manuals is a case study (or 
 something the like) on how to implement a split side scenario. How should the 
 cluster be build then? If you have to sides? If you have one? How should the 
 storage-replication be set up? Is synchronous replication like in drbd really 
 a good idea then, performance wise? I think I'll finally have to buy a book. 
 :-) Any recommendations (either english or german prefered).

 Well, thank's a lot again, my brain didn't explode (that's something good, I 
 feel), but I'm not entirely happy, though.

 Cheers and have a nice weekend,

 Andreas


 
 CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
 Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
 Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke 
 Höfer
 Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Hans 
 Jürgen Niemeier

 CONET Technologies AG, Theodor-Heuss-Allee 19, 53773 Hennef.
 Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 10328 )
 Vorstand/Member of the Managementboard: Rüdiger Zeyen (Sprecher/Chairman), 
 Wilfried Pütz
 Vorsitzender des Aufsichtsrates/Chairman 

Re: [Linux-HA] Looking for a suitable Stonith Solution

2011-02-25 Thread Stallmann, Andreas
Hi!

I conentrate both your answers into one mail, I hope that's allright for you.

 For now, I need an interim solution, which is, as of now, stonith via 
 suicide.
 Doesn't work as suicide is not considered reliable - by definition the 
 remaining nodes have no way to verify that the fencing operation was 
 successful.
 Suspect it will still fail though, suicide isnt a supported fencing option - 
 since obviously the other nodes can't confirm it happened.

Ok then, I know I'm a little bit provocative right now:

If suicide is no supported fencing option, why is it still included with 
stonith? It's badly documented, and I didn't find a single (official) document 
on howto implement a (stable!) suicide-stonith, but it's there, and thus it 
should be usable. If it isn't, the maintainer should please (please!) remove it 
or supply something that's working. I do know, that's quite demanding, because 
the maintainer will probably do the development in his (or her) free time. 
Still...

I do as well agree, that suicide is a very special way of keeping a cluster 
consistent, very different from the other stonith methods. I wouldn't expect it 
under stonith, I'd rather think...

 Yes no-quorum-policy=suicide means that all nodes in the partition will end 
 up being shot, but you still require a real stonith device so that 
 _someone_else_ can perform it.
...that if you set no-quorum-policy=suicide, the suicide script is executed 
by the node itself. It should be an *extra* feature *besides* stonith. The 
procedure should be something like:

1) node1: Allright, I have no quorum anymore. Let's wait for a while...
2)... a while passes
3) node1: OK, I'm still without quorum, no contact to my peers, whatsoever. I'd 
rather shut myself down, before I cause a mess.

If, during (2), the other nodes find a way to shut down the node externaly (if 
through ssh, a power switch, a virtualisation host...), that's even better, 
because then the cluster knows, that it's still consistent. I'm with you, 
here.

If a split brain happens in a split site scenario, a suicide might be the 
only way to keep up consistency, because no one will be able to reach any 
device on the other site... Please correct me if I'm wrong. What do you do in 
such a case? What's your exemplary implementation of Linux-HA then?

On the other hand, it doen't make any other sense to name a no-quorum-policy 
suicide, if it's anything, but a suicide (if, at all, one could name it 
assisted suicide).

Please correct me: Do I have a utterly wrong understanding of the whole process 
(that could be very well the case), is the implementation not entirely thought 
through, or is the naming of certain components not as good as it could be?

I might point you to 
http://osdir.com/ml/linux.highavailability.devel/2007-11/msg00026.html, because 
the same thing has been discussed then, and I very much do think, that Lars was 
right with what he wrote. Has anything changed in the concept of 
suicide/quorum-loss/stonith since then? That's not a provocative question, 
well, maybe it is, but it's not meant to be.

In addition: Something that's missing from the manuals is a case study (or 
something the like) on how to implement a split side scenario. How should the 
cluster be build then? If you have to sides? If you have one? How should the 
storage-replication be set up? Is synchronous replication like in drbd really a 
good idea then, performance wise? I think I'll finally have to buy a book. :-) 
Any recommendations (either english or german prefered).

Well, thank's a lot again, my brain didn't explode (that's something good, I 
feel), but I'm not entirely happy, though.

Cheers and have a nice weekend,

Andreas



CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke 
Höfer
Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Hans Jürgen 
Niemeier

CONET Technologies AG, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 10328 )
Vorstand/Member of the Managementboard: Rüdiger Zeyen (Sprecher/Chairman), 
Wilfried Pütz
Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Dr. Gerd 
Jakob
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Looking for a suitable Stonith Solution

2011-02-24 Thread Stallmann, Andreas
Hi!

TNX for your answer. We will switch to sbd after the shared storage has been 
set up.

For now, I need an interim solution, which is, as of now, stonith via suicide.

My configuration doesn't work, though.

I tried:

~~Output from crm configure show~~
primitive suicide_res stonith:suicide
...
clone fenc_clon suicide_res
...
property $id=cib-bootstrap-options \
dc-version=1.1.2-8b9ec9ccc5060457ac761dce1de719af86895b10 \
cluster-infrastructure=openais \
expected-quorum-votes=3 \
stonith-enabled=true \
no-quorum-policy=suicide \
stonith-action=poweroff



If I disconnect one node from the network, crm_mon shows:


Current DC: mgmt03 - partition WITHOUT quorum
...
Node mgmt01: UNCLEAN (offline)
Node mgmt02: UNCLEAN (offline)
Online: [ mgmt03 ]

Clone Set: fenc_clon
Started: [ ipfuie-mgmt03 ]
Stopped: [ suicide_res:0 suicide_res:1 ]
~~~

No action, neither reboot nor poweroff is taken.

1. What did I do wrong here?
2. OK, let's be more precise: I have the feeling, that the suicide ressource 
should be in a default state of stopped (on all nodes) and should only be 
started on the node, which has to fence itself. Am I right? And, if yes, how is 
that accomplished?
3. How does the no-quorum-policy relate to the stonith-ressources? I didn't 
find any documentation, if the two have any connection at all.
4. Am I correct, that the no-quorum-policy is what a node (or a cluster 
partition) should do to itself, when it looses quorum (for example, shut down 
itself), and stonith is what the nodes with quorum try to do to the nodes 
without?
5. Shouldn't then no-quorum-policy=suicide be obsolet in case of suicide as 
stonith-method?

TNX for your help (again),

Andreas




CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke 
Höfer
Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Hans Jürgen 
Niemeier

CONET Technologies AG, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 10328 )
Vorstand/Member of the Managementboard: Rüdiger Zeyen (Sprecher/Chairman), 
Wilfried Pütz
Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Dr. Gerd 
Jakob
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Looking for a suitable Stonith Solution

2011-02-24 Thread Andrew Beekhof
On Thu, Feb 24, 2011 at 2:49 PM, Stallmann, Andreas astallm...@conet.de wrote:
 Hi!

 TNX for your answer. We will switch to sbd after the shared storage has been 
 set up.

 For now, I need an interim solution, which is, as of now, stonith via suicide.

Doesn't work as suicide is not considered reliable - by definition the
remaining nodes have no way to verify that the fencing operation was
successful.

Yes no-quorum-policy=suicide means that all nodes in the partition
will end up being shot, but you still require a real stonith device so
that _someone_else_ can perform it.


 My configuration doesn't work, though.

 I tried:

 ~~Output from crm configure show~~
 primitive suicide_res stonith:suicide
 ...
 clone fenc_clon suicide_res
 ...
 property $id=cib-bootstrap-options \
        dc-version=1.1.2-8b9ec9ccc5060457ac761dce1de719af86895b10 \
        cluster-infrastructure=openais \
        expected-quorum-votes=3 \
        stonith-enabled=true \
        no-quorum-policy=suicide \
        stonith-action=poweroff
 


 If I disconnect one node from the network, crm_mon shows:

 
 Current DC: mgmt03 - partition WITHOUT quorum
 ...
 Node mgmt01: UNCLEAN (offline)
 Node mgmt02: UNCLEAN (offline)
 Online: [ mgmt03 ]

 Clone Set: fenc_clon
        Started: [ ipfuie-mgmt03 ]
        Stopped: [ suicide_res:0 suicide_res:1 ]
 ~~~

 No action, neither reboot nor poweroff is taken.

 1. What did I do wrong here?
 2. OK, let's be more precise: I have the feeling, that the suicide 
 ressource should be in a default state of stopped (on all nodes) and should 
 only be started on the node, which has to fence itself. Am I right? And, if 
 yes, how is that accomplished?
 3. How does the no-quorum-policy relate to the stonith-ressources? I didn't 
 find any documentation, if the two have any connection at all.
 4. Am I correct, that the no-quorum-policy is what a node (or a cluster 
 partition) should do to itself, when it looses quorum (for example, shut down 
 itself), and stonith is what the nodes with quorum try to do to the nodes 
 without?
 5. Shouldn't then no-quorum-policy=suicide be obsolet in case of suicide as 
 stonith-method?

 TNX for your help (again),

 Andreas



 
 CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
 Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
 Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke 
 Höfer
 Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Hans 
 Jürgen Niemeier

 CONET Technologies AG, Theodor-Heuss-Allee 19, 53773 Hennef.
 Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 10328 )
 Vorstand/Member of the Managementboard: Rüdiger Zeyen (Sprecher/Chairman), 
 Wilfried Pütz
 Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Dr. Gerd 
 Jakob
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Looking for a suitable Stonith Solution

2011-02-23 Thread Stallmann, Andreas
Hello!

I'm currently looking for  a suitable stonith solution for our environment:

1. We have three cluster nodes running OpenSuSE 10.3 with corosync and 
pacemaker.
2. The nodes reside on two VMware ESXi-Servers ( v. 4.1.0) in two locations, 
where one VMware Server hosts two, the other hosts one node of our cluster.
3. We will finally have a shared storage, which might or might not be under our 
governance (this depends on the respective customer).

I examined some stonith methods, with the following results (and I'd like to 
have your opinion to my conclusions):

- (2) rules out any stonith-method, that relies on a power switch or a UPS, as 
the nodes are in no way physically connected to a power circuit.
- (3) rules out sbd, as this method requires access to a physical device, that 
offers the shared storage. Am I right? The manual explicitly says, that sbd may 
even not be used on a DRBD-Partition. Question: Is there a way to insert the 
sbd-Header on a mounted drive instead of a physical partition? Are there any 
other methods of ressource fencing besides sbd?
- (2) is not compatible with the vmware-stonith method, as it requires the 
vmware-host to be reachable via one single IP-Adress. This isn't the case in 
our szenario, the ESXi is not clustered. Question: Has anyone of you modified 
the vmware-stonith script to fit to a set up similar to ours?
- the ssh-method is said to be not the wisest idea, as it requires the host to 
respond to ssh-requests. If the SSH-Daemon hangs or the cluster runs into a 
split-brain, this might not be the case anymore. Any other  opinions?

This, finally, leaves only one method: The suicide. Again, the SuSE HAE-manual 
claims that this method is not suitable for production environments because 
This requires action by the node's operating system and can fail under certain 
circumstances. Therefore avoid using this device whenever possible.

Well, as far as I'm concerned, letting a node shut itself down does not seem 
that a bad idea. How's your experience with this method?

Final Question: Have I missed any suitable method? Are there any other concepts 
of fencing / stonith, which I should give a closer look?

Thanks in advance for all your answer (and I can well live with a RTFM, as 
long as you point me to the fine manual),

Andreas





CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
Gesch?ftsf?hrer/Managing Directors: J?rgen Zender (Sprecher/Chairman), Anke 
H?fer
Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Hans J?rgen 
Niemeier

CONET Technologies AG, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 10328 )
Vorstand/Member of the Managementboard: R?diger Zeyen (Sprecher/Chairman), 
Wilfried P?tz
Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Dr. Gerd 
Jakob
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Looking for a suitable Stonith Solution

2011-02-23 Thread Dejan Muhamedagic
Hi,

On Wed, Feb 23, 2011 at 09:55:00AM +, Stallmann, Andreas wrote:
 Hello!
 
 I'm currently looking for  a suitable stonith solution for our environment:
 
 1. We have three cluster nodes running OpenSuSE 10.3 with corosync and 
 pacemaker.
 2. The nodes reside on two VMware ESXi-Servers ( v. 4.1.0) in two locations, 
 where one VMware Server hosts two, the other hosts one node of our cluster.
 3. We will finally have a shared storage, which might or might not be under 
 our governance (this depends on the respective customer).
 
 I examined some stonith methods, with the following results (and I'd like to 
 have your opinion to my conclusions):
 
 - (2) rules out any stonith-method, that relies on a power switch or a UPS, 
 as the nodes are in no way physically connected to a power circuit.
 - (3) rules out sbd, as this method requires access to a physical device, 
 that offers the shared storage. Am I right? The manual explicitly says, that 
 sbd may even not be used on a DRBD-Partition. Question: Is there a way to 
 insert the sbd-Header on a mounted drive instead of a physical partition? Are 
 there any other methods of ressource fencing besides sbd?

The only requirement for sbd is to have a dedicated disk on
shared storage. That disk (or partition, if you will) doesn't
need to be big (1MB is enough). I don't see how (3) then is an
obstacle.

 - (2) is not compatible with the vmware-stonith method, as it requires the 
 vmware-host to be reachable via one single IP-Adress. This isn't the case in 
 our szenario, the ESXi is not clustered. Question: Has anyone of you modified 
 the vmware-stonith script to fit to a set up similar to ours?
 - the ssh-method is said to be not the wisest idea, as it requires the host 
 to respond to ssh-requests. If the SSH-Daemon hangs or the cluster runs into 
 a split-brain, this might not be the case anymore. Any other  opinions?
 
 This, finally, leaves only one method: The suicide. Again, the SuSE 
 HAE-manual claims that this method is not suitable for production 
 environments because This requires action by the node's operating system and 
 can fail under certain circumstances. Therefore avoid using this device 
 whenever possible.
 
 Well, as far as I'm concerned, letting a node shut itself down does not seem 
 that a bad idea. How's your experience with this method?

No way telling if the suicide succeeded or not.

 Final Question: Have I missed any suitable method? Are there any other 
 concepts of fencing / stonith, which I should give a closer look?

There's also external/libvirt which hasn't been in any release
yet, but seems to be of very good quality. You can get it here:

http://hg.linux-ha.org/glue/file/tip/lib/plugins/stonith/external/libvirt

and just put it into /usr/lib64/stonith/plugins/external

 Thanks in advance for all your answer (and I can well live with a RTFM, as 
 long as you point me to the fine manual),

There's a document on fencing at http://clusterlabs.org

Thanks,

Dejan

 
 Andreas
 
 
 
 
 
 CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
 Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
 Gesch?ftsf?hrer/Managing Directors: J?rgen Zender (Sprecher/Chairman), Anke 
 H?fer
 Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Hans 
 J?rgen Niemeier
 
 CONET Technologies AG, Theodor-Heuss-Allee 19, 53773 Hennef.
 Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 10328 )
 Vorstand/Member of the Managementboard: R?diger Zeyen (Sprecher/Chairman), 
 Wilfried P?tz
 Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Dr. Gerd 
 Jakob
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Looking for a suitable Stonith Solution

2011-02-23 Thread Stallmann, Andreas
Hi!

 - (3) rules out sbd, as this method requires access to a physical device, 
 that offers the shared storage. Am I right? The manual explicitly says, that 
 sbd may even not be used on a DRBD-Partition. Question: Is there a way to 
 insert the sbd-Header on a mounted drive instead of a physical partition? 
 Are there any other methods of ressource fencing besides sbd?

The only requirement for sbd is to have a dedicated disk on shared storage. 
That disk (or partition, if you will) doesn't
 need to be big (1MB is enough). I don't see how (3) then is an obstacle.

Let me see, if I got the manual (see: http://www.linux-ha.org/wiki/SBD_Fencing) 
right:

a) Our customer might only grant us one storage access via nfs. Can one create 
an sbd on a NFS-share?
b) If we set up a shared storage ourselves, we want it to be redundant itself, 
thus setting it up with drbd is very likely. The manual says: The SBD device 
must not make use of host-based RAID.  and The SBD device must not reside on 
a drbd instance.

Did I get this right: The sbd-Partition is  not allowed to reside on either a 
RAID or a DBRD? Well? Doesn't that mess with the concept of redundancy? Let's 
say, we have a three-node shared storage, using DRBD to keep the partitions 
redundant between the shared-storage nodes, exporting the storage tot he other 
nodes via NFS: Where and how shall the sbd device be created? Only on one oft 
he storage nodes? Or on each oft he storage nodes? Or somehow on a clustered 
partition (that would mean drbd, again, wouldn't it?).

To me only the latter makes sense, because, as the manual says:

This can be a logical unit, partition, or a logical volume; but it must be 
accessible from all nodes.

A... my brain is starting to explode... ;-)

Please, I feel that I get something entirely wrong here. May, for example, the 
sbd be created on a partion or logical volume, that I created on an drbd-device 
(or RAID), and the no drbd-rule (or NO RAID) only means, that the sbd may not 
be created on the drbd (or RAID) directly?

  No way telling if the suicide succeeded or not.

Yes, but on the other hand, suicide is quite independent of the network, while 
for all the power-off-methods (including vmware) I have to have at least access 
to the power device (or VM-host), which might not be the case, if all 
communication between two locations is demolished (classical split brain).

 There's also external/libvirt which hasn't been in any release yet, but seems 
 to be of very good quality. You can get it here:

Thanks, I'll check it out!

 There's a document on fencing at http://clusterlabs.org
Which has been written by you, right? Don't get me wrong, it is excellent, and 
I already read it (it's nearly word-by-word included in the SLES HAE-Manual): 
http://www.clusterlabs.org/doc/crm_fencing.html

Further help is still very welcome.

TNX in advance,

Andreas



CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke 
Höfer
Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Hans Jürgen 
Niemeier

CONET Technologies AG, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 10328 )
Vorstand/Member of the Managementboard: Rüdiger Zeyen (Sprecher/Chairman), 
Wilfried Pütz
Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Dr. Gerd 
Jakob
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Looking for a suitable Stonith Solution

2011-02-23 Thread Lars Ellenberg
On Wed, Feb 23, 2011 at 12:19:20PM +, Stallmann, Andreas wrote:
 Hi!
 
  - (3) rules out sbd, as this method requires access to a physical device, 
  that offers the shared storage. Am I right? The manual explicitly says, 
  that sbd may even not be used on a DRBD-Partition. Question: Is there a 
  way to insert the sbd-Header on a mounted drive instead of a physical 
  partition? Are there any other methods of ressource fencing besides sbd?
 
 The only requirement for sbd is to have a dedicated disk on shared storage. 
 That disk (or partition, if you will) doesn't
  need to be big (1MB is enough). I don't see how (3) then is an obstacle.
 
 Let me see, if I got the manual (see: 
 http://www.linux-ha.org/wiki/SBD_Fencing) right:
 
 a) Our customer might only grant us one storage access via nfs. Can one 
 create an sbd on a NFS-share?

Please no-one try a loop-mounted image file on NFS ;-)
Even though in theory it may work, if you mount -o sync ...
*Outch*

 b) If we set up a shared storage ourselves, we want it to be redundant
 itself, thus setting it up with drbd is very likely. The manual says:
 The SBD device must not make use of host-based RAID.  and The SBD
 device must not reside on a drbd instance.
 
 Did I get this right: The sbd-Partition is  not allowed to reside on
 either a RAID or a DBRD? Well? Doesn't that mess with the concept of
 redundancy? Let's say, we have a three-node shared storage, using DRBD
 to keep the partitions redundant between the shared-storage nodes,
 exporting the storage tot he other nodes via NFS: Where and how shall
 the sbd device be created? Only on one oft he storage nodes? Or on
 each oft he storage nodes? Or somehow on a clustered partition (that
 would mean drbd, again, wouldn't it?).
 
 To me only the latter makes sense, because, as the manual says:
 
 This can be a logical unit, partition, or a logical volume; but it
 must be accessible from all nodes.
 
 A... my brain is starting to explode... ;-)

Does this help?
http://www.linux-ha.org/w/index.php?title=SBD_Fencingdiff=481oldid=97

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Looking for a suitable Stonith Solution

2011-02-23 Thread Stallmann, Andreas
Hi there!

...
 Please no-one try a loop-mounted image file on NFS ;-) Even though in theory 
 it may work, if you mount -o sync ...
 *Outch*
...
 Does this help?
 http://www.linux-ha.org/w/index.php?title=SBD_Fencingdiff=481oldid=97

Yes, this helps... somehow. Well, I should use iSCSI to share my storage, 
right? And use an iSCSI-LUN to write the sbd on, right? Does the sbd-device 
have to be accessible to all the nodes at the same time? I mean: Do they all 
have to mount the sbd-device at the same time? And if the sbd resides on an 
iSCSCI-LUN which itself resides on a DRBD, and the communication crashes, won't 
that again destroy the effect of the poison pill?

How about the Storage itself? We definetly have to use a different fencing 
approach there, right? (Or we have to give shared storage to our shared 
storage, which then again needs shared storage... ad infinitum).
Thanks for your help so far.

Andreas




--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
Geschäftsführer/Managing Directors: Jürgen Zender (Sprecher/Chairman), Anke 
Höfer
Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Hans Jürgen 
Niemeier

CONET Technologies AG, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 10328 )
Vorstand/Member of the Managementboard: Rüdiger Zeyen (Sprecher/Chairman), 
Wilfried Pütz
Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: Dr. Gerd 
Jakob
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems