[ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-04 Thread Dan Swartzendruber
I'm setting up an HA NFS server to serve up storage to a couple of 
vsphere hosts.  I have a virtual IP, and it depends on a ZFS resource 
agent which imports or exports a pool.  So far, with stonith disabled, 
it all works perfectly.  I was dubious about a 2-node solution, so I 
created a 3rd node which runs as a virtual machine on one of the hosts.  
All it is for is quorum.  So, looking at fencing next.  The primary 
server is a poweredge R905, which has DRAC for fencing.  The backup 
storage node is a Supermicro X9-SCL-F (with IPMI).  So I would be using 
the DRAC agent for the former and the ipmilan for the latter?  I was 
reading about location constraints, where you tell each instance of the 
fencing agent not to run on the node that would be getting fenced.  So, 
my first thought was to configure the drac agent and tell it not to 
fence node 1, and configure the ipmilan agent and tell it not to fence 
node 2.  The thing is, there is no agent available for the quorum node.  
Would it make more sense instead to tell the drac agent to only run on 
node 2, and the ipmilan agent to only run on node 1?  Thanks!


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-04 Thread Digimer
On 04/08/16 06:56 PM, Dan Swartzendruber wrote:
> I'm setting up an HA NFS server to serve up storage to a couple of
> vsphere hosts.  I have a virtual IP, and it depends on a ZFS resource
> agent which imports or exports a pool.  So far, with stonith disabled,
> it all works perfectly.  I was dubious about a 2-node solution, so I
> created a 3rd node which runs as a virtual machine on one of the hosts. 
> All it is for is quorum.  So, looking at fencing next.  The primary
> server is a poweredge R905, which has DRAC for fencing.  The backup
> storage node is a Supermicro X9-SCL-F (with IPMI).  So I would be using
> the DRAC agent for the former and the ipmilan for the latter?  I was
> reading about location constraints, where you tell each instance of the
> fencing agent not to run on the node that would be getting fenced.  So,
> my first thought was to configure the drac agent and tell it not to
> fence node 1, and configure the ipmilan agent and tell it not to fence
> node 2.  The thing is, there is no agent available for the quorum node. 
> Would it make more sense instead to tell the drac agent to only run on
> node 2, and the ipmilan agent to only run on node 1?  Thanks!

This is a common mistake.

Fencing and quorum solve different problems and are not interchangeable.

In short;

Fencing is a tool when things go wrong.

Quorum is a tool when things are working.

The only impact that having quorum has with regard to fencing is that it
avoids a scenario when both nodes try to fence each other and the faster
one wins (which is itself OK). Even then, you can add 'delay=15' the
node you want to win and it will win is such a case. In the old days, it
would also prevent a fence loop if you started the cluster on boot and
comms were down. Now though, you set 'wait_for_all' and you won't get a
fence loop, so that solves that.

Said another way; Quorum is optional, fencing is not (people often get
that backwards).

As for DRAC vs IPMI, no, they are not two things. In fact, I am pretty
certain that fence_drac is a symlink to fence_ipmilan. All DRAC is (same
with iRMC, iLO, RSA, etc) is "IPMI + features". Fundamentally, the fence
action; rebooting the node, works via the basic IPMI standard using the
DRAC's BMC.

To do proper redundant fencing, which is a great idea, you want
something like switched PDUs. This is how we do it (with two node
clusters). IPMI first, and if that fails, a pair of PDUs (one for each
PSU, each PDU going to independent UPSes) as backup.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-04 Thread Dan Swartzendruber

On 2016-08-04 19:03, Digimer wrote:

On 04/08/16 06:56 PM, Dan Swartzendruber wrote:

I'm setting up an HA NFS server to serve up storage to a couple of
vsphere hosts.  I have a virtual IP, and it depends on a ZFS resource
agent which imports or exports a pool.  So far, with stonith disabled,
it all works perfectly.  I was dubious about a 2-node solution, so I
created a 3rd node which runs as a virtual machine on one of the 
hosts.

All it is for is quorum.  So, looking at fencing next.  The primary
server is a poweredge R905, which has DRAC for fencing.  The backup
storage node is a Supermicro X9-SCL-F (with IPMI).  So I would be 
using

the DRAC agent for the former and the ipmilan for the latter?  I was
reading about location constraints, where you tell each instance of 
the
fencing agent not to run on the node that would be getting fenced.  
So,

my first thought was to configure the drac agent and tell it not to
fence node 1, and configure the ipmilan agent and tell it not to fence
node 2.  The thing is, there is no agent available for the quorum 
node.

Would it make more sense instead to tell the drac agent to only run on
node 2, and the ipmilan agent to only run on node 1?  Thanks!


This is a common mistake.

Fencing and quorum solve different problems and are not 
interchangeable.


In short;

Fencing is a tool when things go wrong.

Quorum is a tool when things are working.

The only impact that having quorum has with regard to fencing is that 
it
avoids a scenario when both nodes try to fence each other and the 
faster

one wins (which is itself OK). Even then, you can add 'delay=15' the
node you want to win and it will win is such a case. In the old days, 
it

would also prevent a fence loop if you started the cluster on boot and
comms were down. Now though, you set 'wait_for_all' and you won't get a
fence loop, so that solves that.

Said another way; Quorum is optional, fencing is not (people often get
that backwards).

As for DRAC vs IPMI, no, they are not two things. In fact, I am pretty
certain that fence_drac is a symlink to fence_ipmilan. All DRAC is 
(same
with iRMC, iLO, RSA, etc) is "IPMI + features". Fundamentally, the 
fence

action; rebooting the node, works via the basic IPMI standard using the
DRAC's BMC.

To do proper redundant fencing, which is a great idea, you want
something like switched PDUs. This is how we do it (with two node
clusters). IPMI first, and if that fails, a pair of PDUs (one for each
PSU, each PDU going to independent UPSes) as backup.


Thanks for the quick response.  I didn't mean to give the impression 
that I didn't know the different between quorum and fencing.  The only 
reason I (currently) have the quorum node was to prevent a deathmatch 
(which I had read about elsewhere.)  If it is as simple as adding a 
delay as you describe, I'm inclined to go that route.  At least on 
CentOS7, fence_ipmilan and fence_drac are not the same.  e.g. they are 
both python scripts that are totally different.




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-04 Thread Digimer
On 04/08/16 07:21 PM, Dan Swartzendruber wrote:
> On 2016-08-04 19:03, Digimer wrote:
>> On 04/08/16 06:56 PM, Dan Swartzendruber wrote:
>>> I'm setting up an HA NFS server to serve up storage to a couple of
>>> vsphere hosts.  I have a virtual IP, and it depends on a ZFS resource
>>> agent which imports or exports a pool.  So far, with stonith disabled,
>>> it all works perfectly.  I was dubious about a 2-node solution, so I
>>> created a 3rd node which runs as a virtual machine on one of the hosts.
>>> All it is for is quorum.  So, looking at fencing next.  The primary
>>> server is a poweredge R905, which has DRAC for fencing.  The backup
>>> storage node is a Supermicro X9-SCL-F (with IPMI).  So I would be using
>>> the DRAC agent for the former and the ipmilan for the latter?  I was
>>> reading about location constraints, where you tell each instance of the
>>> fencing agent not to run on the node that would be getting fenced.  So,
>>> my first thought was to configure the drac agent and tell it not to
>>> fence node 1, and configure the ipmilan agent and tell it not to fence
>>> node 2.  The thing is, there is no agent available for the quorum node.
>>> Would it make more sense instead to tell the drac agent to only run on
>>> node 2, and the ipmilan agent to only run on node 1?  Thanks!
>>
>> This is a common mistake.
>>
>> Fencing and quorum solve different problems and are not interchangeable.
>>
>> In short;
>>
>> Fencing is a tool when things go wrong.
>>
>> Quorum is a tool when things are working.
>>
>> The only impact that having quorum has with regard to fencing is that it
>> avoids a scenario when both nodes try to fence each other and the faster
>> one wins (which is itself OK). Even then, you can add 'delay=15' the
>> node you want to win and it will win is such a case. In the old days, it
>> would also prevent a fence loop if you started the cluster on boot and
>> comms were down. Now though, you set 'wait_for_all' and you won't get a
>> fence loop, so that solves that.
>>
>> Said another way; Quorum is optional, fencing is not (people often get
>> that backwards).
>>
>> As for DRAC vs IPMI, no, they are not two things. In fact, I am pretty
>> certain that fence_drac is a symlink to fence_ipmilan. All DRAC is (same
>> with iRMC, iLO, RSA, etc) is "IPMI + features". Fundamentally, the fence
>> action; rebooting the node, works via the basic IPMI standard using the
>> DRAC's BMC.
>>
>> To do proper redundant fencing, which is a great idea, you want
>> something like switched PDUs. This is how we do it (with two node
>> clusters). IPMI first, and if that fails, a pair of PDUs (one for each
>> PSU, each PDU going to independent UPSes) as backup.
> 
> Thanks for the quick response.  I didn't mean to give the impression
> that I didn't know the different between quorum and fencing.  The only
> reason I (currently) have the quorum node was to prevent a deathmatch
> (which I had read about elsewhere.)  If it is as simple as adding a
> delay as you describe, I'm inclined to go that route.  At least on
> CentOS7, fence_ipmilan and fence_drac are not the same.  e.g. they are
> both python scripts that are totally different.

The delay is perfectly fine. We've shipped dozens of two-node systems
over the last five or so years and all were 2-node and none have had
trouble. Where node failures have occurred, fencing operated properly
and services were recovered. So in my opinion, in the interest of
minimizing complexity, I recommend the two-node approach.

As for the two agents not being symlinked, OK. It still doesn't change
the core point through that both fence_ipmilan and fence_drac would be
acting on the same target.

Note; If you lose power to the mainboard (which we've seen, failed
mainboard voltage regulator did this once), you lose the IPMI (DRAC)
BMC. This scenario will leave your cluster blocked without an external
secondary fence method, like switched PDUs.

cheers

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-04 Thread Dan Swartzendruber

On 2016-08-04 19:33, Digimer wrote:

On 04/08/16 07:21 PM, Dan Swartzendruber wrote:

On 2016-08-04 19:03, Digimer wrote:

On 04/08/16 06:56 PM, Dan Swartzendruber wrote:

I'm setting up an HA NFS server to serve up storage to a couple of
vsphere hosts.  I have a virtual IP, and it depends on a ZFS 
resource
agent which imports or exports a pool.  So far, with stonith 
disabled,

it all works perfectly.  I was dubious about a 2-node solution, so I
created a 3rd node which runs as a virtual machine on one of the 
hosts.

All it is for is quorum.  So, looking at fencing next.  The primary
server is a poweredge R905, which has DRAC for fencing.  The backup
storage node is a Supermicro X9-SCL-F (with IPMI).  So I would be 
using

the DRAC agent for the former and the ipmilan for the latter?  I was
reading about location constraints, where you tell each instance of 
the
fencing agent not to run on the node that would be getting fenced.  
So,

my first thought was to configure the drac agent and tell it not to
fence node 1, and configure the ipmilan agent and tell it not to 
fence
node 2.  The thing is, there is no agent available for the quorum 
node.
Would it make more sense instead to tell the drac agent to only run 
on

node 2, and the ipmilan agent to only run on node 1?  Thanks!


This is a common mistake.

Fencing and quorum solve different problems and are not 
interchangeable.


In short;

Fencing is a tool when things go wrong.

Quorum is a tool when things are working.

The only impact that having quorum has with regard to fencing is that 
it
avoids a scenario when both nodes try to fence each other and the 
faster

one wins (which is itself OK). Even then, you can add 'delay=15' the
node you want to win and it will win is such a case. In the old days, 
it
would also prevent a fence loop if you started the cluster on boot 
and
comms were down. Now though, you set 'wait_for_all' and you won't get 
a

fence loop, so that solves that.

Said another way; Quorum is optional, fencing is not (people often 
get

that backwards).

As for DRAC vs IPMI, no, they are not two things. In fact, I am 
pretty
certain that fence_drac is a symlink to fence_ipmilan. All DRAC is 
(same
with iRMC, iLO, RSA, etc) is "IPMI + features". Fundamentally, the 
fence
action; rebooting the node, works via the basic IPMI standard using 
the

DRAC's BMC.

To do proper redundant fencing, which is a great idea, you want
something like switched PDUs. This is how we do it (with two node
clusters). IPMI first, and if that fails, a pair of PDUs (one for 
each

PSU, each PDU going to independent UPSes) as backup.


Thanks for the quick response.  I didn't mean to give the impression
that I didn't know the different between quorum and fencing.  The only
reason I (currently) have the quorum node was to prevent a deathmatch
(which I had read about elsewhere.)  If it is as simple as adding a
delay as you describe, I'm inclined to go that route.  At least on
CentOS7, fence_ipmilan and fence_drac are not the same.  e.g. they are
both python scripts that are totally different.


The delay is perfectly fine. We've shipped dozens of two-node systems
over the last five or so years and all were 2-node and none have had
trouble. Where node failures have occurred, fencing operated properly
and services were recovered. So in my opinion, in the interest of
minimizing complexity, I recommend the two-node approach.

As for the two agents not being symlinked, OK. It still doesn't change
the core point through that both fence_ipmilan and fence_drac would be
acting on the same target.

Note; If you lose power to the mainboard (which we've seen, failed
mainboard voltage regulator did this once), you lose the IPMI (DRAC)
BMC. This scenario will leave your cluster blocked without an external
secondary fence method, like switched PDUs.

cheers


Thanks!



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-04 Thread Andrei Borzenkov
05.08.2016 02:33, Digimer пишет:
> On 04/08/16 07:21 PM, Dan Swartzendruber wrote:
>> On 2016-08-04 19:03, Digimer wrote:
>>> On 04/08/16 06:56 PM, Dan Swartzendruber wrote:
 I'm setting up an HA NFS server to serve up storage to a couple of
 vsphere hosts.  I have a virtual IP, and it depends on a ZFS resource
 agent which imports or exports a pool.

...

> 
> Note; If you lose power to the mainboard (which we've seen, failed
> mainboard voltage regulator did this once), you lose the IPMI (DRAC)
> BMC. This scenario will leave your cluster blocked without an external
> secondary fence method, like switched PDUs.
> 

As in this case there is shared storage (at least, so I understood),
using persistent SCSI reservations or SBD as secondary channel can be
considered.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-04 Thread Digimer
On 04/08/16 11:44 PM, Andrei Borzenkov wrote:
> 05.08.2016 02:33, Digimer пишет:
>> On 04/08/16 07:21 PM, Dan Swartzendruber wrote:
>>> On 2016-08-04 19:03, Digimer wrote:
 On 04/08/16 06:56 PM, Dan Swartzendruber wrote:
> I'm setting up an HA NFS server to serve up storage to a couple of
> vsphere hosts.  I have a virtual IP, and it depends on a ZFS resource
> agent which imports or exports a pool.
> 
> ...
> 
>>
>> Note; If you lose power to the mainboard (which we've seen, failed
>> mainboard voltage regulator did this once), you lose the IPMI (DRAC)
>> BMC. This scenario will leave your cluster blocked without an external
>> secondary fence method, like switched PDUs.
>>
> 
> As in this case there is shared storage (at least, so I understood),
> using persistent SCSI reservations or SBD as secondary channel can be
> considered.

Yup. That would be fabric fencing though, or are you talking about using
it under watchdog timers? If fabric, then my worry is always a panic'ed
admin clearing it without properly verifying the state of the lost node.
With watchdog, it's fine, just slow.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-05 Thread Andrei Borzenkov
On Fri, Aug 5, 2016 at 7:08 AM, Digimer  wrote:
> On 04/08/16 11:44 PM, Andrei Borzenkov wrote:
>> 05.08.2016 02:33, Digimer пишет:
>>> On 04/08/16 07:21 PM, Dan Swartzendruber wrote:
 On 2016-08-04 19:03, Digimer wrote:
> On 04/08/16 06:56 PM, Dan Swartzendruber wrote:
>> I'm setting up an HA NFS server to serve up storage to a couple of
>> vsphere hosts.  I have a virtual IP, and it depends on a ZFS resource
>> agent which imports or exports a pool.
>>
>> ...
>>
>>>
>>> Note; If you lose power to the mainboard (which we've seen, failed
>>> mainboard voltage regulator did this once), you lose the IPMI (DRAC)
>>> BMC. This scenario will leave your cluster blocked without an external
>>> secondary fence method, like switched PDUs.
>>>
>>
>> As in this case there is shared storage (at least, so I understood),
>> using persistent SCSI reservations or SBD as secondary channel can be
>> considered.
>
> Yup. That would be fabric fencing though, or are you talking about using
> it under watchdog timers?

fabric is the third possibility :) No, I rather mean something like fence_scsi.

Although the practical problem of both fabric or scsi fencing is that
it only prevents concurrent access to shared storage; it does not
guarantee that other resources are also cleaned up, so may end up with
duplicated IP or similar.

> If fabric, then my worry is always a panic'ed
> admin clearing it without properly verifying the state of the lost node.
> With watchdog, it's fine, just slow.
>

As it is last resort better slow than never.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-05 Thread Dan Swartzendruber


A lot of good suggestions here.  Unfortunately, my budget is tapped out 
for the near future at least (this is a home lab/soho setup).  I'm 
inclined to go with Digimer's two-node approach, with IPMI fencing.  I 
understand mobos can die and such.  In such a long-shot, manual 
intervention is fine.  So, when I get a chance, I need to remove the 
quorum node from the cluster and switch it to two_node mode.  Thanks for 
the info!


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-06 Thread Dan Swartzendruber


Okay, I almost have this all working.  fence_ipmilan for the supermicro 
host.  Had to specify lanplus for it to work.  fence_drac5 for the R905. 
 That was failing to complete due to timeout.  Found a couple of helpful 
posts that recommended increase the retry count to 3 and the timeout to 
60.  That worked also.  The only problem now, is that it takes well over 
a minute to complete the fencing operation.  In that interim, the fenced 
host shows as UNCLEAN (offline), and because the fencing operation 
hasn't completed, the other node has to wait to import the pool and 
share out the filesystem.  This causes the vsphere hosts to declare the 
NFS datastore down.  I hadn't gotten exact timing, but I think the 
fencing operation took a little over a minute.  I'm wondering if I could 
change the timeout to a smaller value, but increase the retries?  Like 
back to the default 20 second timeout, but change retries from 1 to 5?


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-06 Thread Digimer
On 06/08/16 07:33 PM, Dan Swartzendruber wrote:
> 
> Okay, I almost have this all working.  fence_ipmilan for the supermicro
> host.  Had to specify lanplus for it to work.  fence_drac5 for the R905.
>  That was failing to complete due to timeout.  Found a couple of helpful
> posts that recommended increase the retry count to 3 and the timeout to
> 60.  That worked also.  The only problem now, is that it takes well over
> a minute to complete the fencing operation.  In that interim, the fenced
> host shows as UNCLEAN (offline), and because the fencing operation
> hasn't completed, the other node has to wait to import the pool and
> share out the filesystem.  This causes the vsphere hosts to declare the
> NFS datastore down.  I hadn't gotten exact timing, but I think the
> fencing operation took a little over a minute.  I'm wondering if I could
> change the timeout to a smaller value, but increase the retries?  Like
> back to the default 20 second timeout, but change retries from 1 to 5?

Did you try the fence_ipmilan against the DRAC? It *should* work. Would
be interesting to see if it had the same issue. Can you check the DRAC's
host's power state using ipmitool directly without delay?

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-06 Thread Dan Swartzendruber

On 2016-08-06 19:46, Digimer wrote:

On 06/08/16 07:33 PM, Dan Swartzendruber wrote:


Okay, I almost have this all working.  fence_ipmilan for the 
supermicro
host.  Had to specify lanplus for it to work.  fence_drac5 for the 
R905.
 That was failing to complete due to timeout.  Found a couple of 
helpful
posts that recommended increase the retry count to 3 and the timeout 
to
60.  That worked also.  The only problem now, is that it takes well 
over
a minute to complete the fencing operation.  In that interim, the 
fenced

host shows as UNCLEAN (offline), and because the fencing operation
hasn't completed, the other node has to wait to import the pool and
share out the filesystem.  This causes the vsphere hosts to declare 
the

NFS datastore down.  I hadn't gotten exact timing, but I think the
fencing operation took a little over a minute.  I'm wondering if I 
could

change the timeout to a smaller value, but increase the retries?  Like
back to the default 20 second timeout, but change retries from 1 to 5?


Did you try the fence_ipmilan against the DRAC? It *should* work. Would
be interesting to see if it had the same issue. Can you check the 
DRAC's

host's power state using ipmitool directly without delay?


Yes, I did try fence_ipmilan, but it got the timeout waiting for power 
off (or whatever).  I have to admit, I switched to fence_drac and had 
the same issue, but after increasing the timeout and retries, got it to 
work, so it is possible, that fence_ipmilan is okay.  They both seemed 
to take more than 60 seconds to complete the operation.  I have to say 
that when I do a power cycle through the drac web interface, it takes 
awhile, so that might be normal.  I think I will try again with 20 
seconds and 5 retries and see how that goes...


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-06 Thread Digimer
On 06/08/16 08:22 PM, Dan Swartzendruber wrote:
> On 2016-08-06 19:46, Digimer wrote:
>> On 06/08/16 07:33 PM, Dan Swartzendruber wrote:
>>>
>>> Okay, I almost have this all working.  fence_ipmilan for the supermicro
>>> host.  Had to specify lanplus for it to work.  fence_drac5 for the R905.
>>>  That was failing to complete due to timeout.  Found a couple of helpful
>>> posts that recommended increase the retry count to 3 and the timeout to
>>> 60.  That worked also.  The only problem now, is that it takes well over
>>> a minute to complete the fencing operation.  In that interim, the fenced
>>> host shows as UNCLEAN (offline), and because the fencing operation
>>> hasn't completed, the other node has to wait to import the pool and
>>> share out the filesystem.  This causes the vsphere hosts to declare the
>>> NFS datastore down.  I hadn't gotten exact timing, but I think the
>>> fencing operation took a little over a minute.  I'm wondering if I could
>>> change the timeout to a smaller value, but increase the retries?  Like
>>> back to the default 20 second timeout, but change retries from 1 to 5?
>>
>> Did you try the fence_ipmilan against the DRAC? It *should* work. Would
>> be interesting to see if it had the same issue. Can you check the DRAC's
>> host's power state using ipmitool directly without delay?
> 
> Yes, I did try fence_ipmilan, but it got the timeout waiting for power
> off (or whatever).  I have to admit, I switched to fence_drac and had
> the same issue, but after increasing the timeout and retries, got it to
> work, so it is possible, that fence_ipmilan is okay.  They both seemed
> to take more than 60 seconds to complete the operation.  I have to say
> that when I do a power cycle through the drac web interface, it takes
> awhile, so that might be normal.  I think I will try again with 20
> seconds and 5 retries and see how that goes...

What about using ipmitool directly? I can't imagine that such a long
time is normal. Maybe there is a firmware update for the DRAC and/or
BIOS? (I know with Fujitsu, they recommend updating the IPMI BMC and
BIOS together).

Over a minute to fence is, strictly speaking, OK. However, that's a
significant delay in time to recover.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-06 Thread Dan Swartzendruber

On 2016-08-06 21:59, Digimer wrote:

On 06/08/16 08:22 PM, Dan Swartzendruber wrote:

On 2016-08-06 19:46, Digimer wrote:

On 06/08/16 07:33 PM, Dan Swartzendruber wrote:


(snip)


What about using ipmitool directly? I can't imagine that such a long
time is normal. Maybe there is a firmware update for the DRAC and/or
BIOS? (I know with Fujitsu, they recommend updating the IPMI BMC and
BIOS together).


Unfortunately, the R905 is EoL, so any updates are not likely.


Over a minute to fence is, strictly speaking, OK. However, that's a
significant delay in time to recover.


The thing that concerns me, though, is the delay in I/O for vsphere 
clients.  I know 2 or more retries of 60 seconds caused issues.  I'm 
going to try again with 5 20-second retries, and see how that works.  If 
this doesn't cooperate, I may need to look into an PDU or something...


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-06 Thread Dan Swartzendruber

On 2016-08-06 21:59, Digimer wrote:

On 06/08/16 08:22 PM, Dan Swartzendruber wrote:

On 2016-08-06 19:46, Digimer wrote:

On 06/08/16 07:33 PM, Dan Swartzendruber wrote:




(snip)


What about using ipmitool directly? I can't imagine that such a long
time is normal. Maybe there is a firmware update for the DRAC and/or
BIOS? (I know with Fujitsu, they recommend updating the IPMI BMC and
BIOS together).

Over a minute to fence is, strictly speaking, OK. However, that's a
significant delay in time to recover.


Okay, I tested with 20 second timeout and 5 retries, using fence_drac5 
at the command line.  Ran 'date' on both sides to see how long it took.  
Just under a minute.  It's too late now to mess around any more for 
tonight.  I do need to verify that that works okay for vsphere.  I will 
post back my results.  Thanks!


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-15 Thread Jan Pokorný
> On 04/08/16 07:21 PM, Dan Swartzendruber wrote:
>> On 2016-08-04 19:03, Digimer wrote:
>>> As for DRAC vs IPMI, no, they are not two things. In fact, I am pretty
>>> certain that fence_drac is a symlink to fence_ipmilan. All DRAC is (same
>>> with iRMC, iLO, RSA, etc) is "IPMI + features". Fundamentally, the fence
>>> action; rebooting the node, works via the basic IPMI standard using the
>>> DRAC's BMC.
>>> 
>>> [...]
>> 
>> At least on CentOS7, fence_ipmilan and fence_drac are not the same.
>> e.g. they are both python scripts that are totally different.
> 
> [...] 
> 
> As for the two agents not being symlinked, OK. It still doesn't change
> the core point through that both fence_ipmilan and fence_drac would be
> acting on the same target.

Just thought I'd add some clarifications:

- in fact fence-agents upstream seems to have thrown the idea of
  proper symlinks away before functionality to that effect was added,
  eventually using file copies instead of symlinks, with the rationale
  "this approach is not recommended so they regular files"

  [Marx&Oyvind, I cannot really imagine what issues this was meant to
  solve nor why it would be not recommended (in Pacemaker, stat calls
  are used that work with symlink targets, not the immediate link
  files, ditto other standard file handling functions), but it seems
  pretty non-systemic compared to, e.g., fence_xvm -> fence_virt:
  
https://github.com/ClusterLabs/fence-virt/blob/f1f1a2437c5b0811269b5859a5ef646f44105a88/client/Makefile.in#L39
  and this also makes resulting packages inflated with redundant
  scripts + man pages needlessly;  I'd make a PR for that but it
  seems premature until the recursive make/install issue with
  "symlinked" agents has a definitive conclusion (PR 81+82), but
  basically, you just want 'ln -s SRC DST' instead of 'cp SRC DST']

- fence_ipmilan and fence_drac are indeed not even virtually
  symlinked; quick and dirty way to receive this information, see
  https://bugzilla.redhat.com/show_bug.cgi?id=1210679#c12
  (you may need ' | tr -s " "' just after 'ls -l' command)
  from where you can see that it is fence_idrac which is a virtual
  symlink (same implementation) as fence_ipmilan, while fence_drac
  is an agent on its own


Hope this helps.

-- 
Jan (Poki)


pgpF127a0x1Kd.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-15 Thread Jan Pokorný
On 15/08/16 14:48 +0200, Jan Pokorný wrote:
>> On 04/08/16 07:21 PM, Dan Swartzendruber wrote:
>>> On 2016-08-04 19:03, Digimer wrote:
 As for DRAC vs IPMI, no, they are not two things. In fact, I am pretty
 certain that fence_drac is a symlink to fence_ipmilan. All DRAC is (same
 with iRMC, iLO, RSA, etc) is "IPMI + features". Fundamentally, the fence
 action; rebooting the node, works via the basic IPMI standard using the
 DRAC's BMC.
 
 [...]
>>> 
>>> At least on CentOS7, fence_ipmilan and fence_drac are not the same.
>>> e.g. they are both python scripts that are totally different.
>> 
>> [...] 
>> 
>> As for the two agents not being symlinked, OK. It still doesn't change
>> the core point through that both fence_ipmilan and fence_drac would be
>> acting on the same target.
> 
> Just thought I'd add some clarifications:
> 
> - in fact fence-agents upstream seems to have thrown the idea of
>   proper symlinks away before functionality to that effect was added,
>   eventually using file copies instead of symlinks, with the rationale
>   "this approach is not recommended so they regular files"

Reference needed (accidentally omitted):
https://github.com/ClusterLabs/fence-agents/commit/87266bc

>   [Marx&Oyvind, I cannot really imagine what issues this was meant to
>   solve nor why it would be not recommended (in Pacemaker, stat calls
>   are used that work with symlink targets, not the immediate link
>   files, ditto other standard file handling functions), but it seems
>   pretty non-systemic compared to, e.g., fence_xvm -> fence_virt:
>   
> https://github.com/ClusterLabs/fence-virt/blob/f1f1a2437c5b0811269b5859a5ef646f44105a88/client/Makefile.in#L39
>   and this also makes resulting packages inflated with redundant
>   scripts + man pages needlessly;  I'd make a PR for that but it
>   seems premature until the recursive make/install issue with
>   "symlinked" agents has a definitive conclusion (PR 81+82), but
>   basically, you just want 'ln -s SRC DST' instead of 'cp SRC DST']
> 
> - fence_ipmilan and fence_drac are indeed not even virtually
>   symlinked; quick and dirty way to receive this information, see
>   https://bugzilla.redhat.com/show_bug.cgi?id=1210679#c12
>   (you may need ' | tr -s " "' just after 'ls -l' command)
>   from where you can see that it is fence_idrac which is a virtual
>   symlink (same implementation) as fence_ipmilan, while fence_drac
>   is an agent on its own
> 
> 
> Hope this helps.

-- 
Jan (Poki)


pgptRj7oPo5EC.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org