[ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
I'm setting up an HA NFS server to serve up storage to a couple of vsphere hosts. I have a virtual IP, and it depends on a ZFS resource agent which imports or exports a pool. So far, with stonith disabled, it all works perfectly. I was dubious about a 2-node solution, so I created a 3rd node which runs as a virtual machine on one of the hosts. All it is for is quorum. So, looking at fencing next. The primary server is a poweredge R905, which has DRAC for fencing. The backup storage node is a Supermicro X9-SCL-F (with IPMI). So I would be using the DRAC agent for the former and the ipmilan for the latter? I was reading about location constraints, where you tell each instance of the fencing agent not to run on the node that would be getting fenced. So, my first thought was to configure the drac agent and tell it not to fence node 1, and configure the ipmilan agent and tell it not to fence node 2. The thing is, there is no agent available for the quorum node. Would it make more sense instead to tell the drac agent to only run on node 2, and the ipmilan agent to only run on node 1? Thanks! ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
On 04/08/16 06:56 PM, Dan Swartzendruber wrote: > I'm setting up an HA NFS server to serve up storage to a couple of > vsphere hosts. I have a virtual IP, and it depends on a ZFS resource > agent which imports or exports a pool. So far, with stonith disabled, > it all works perfectly. I was dubious about a 2-node solution, so I > created a 3rd node which runs as a virtual machine on one of the hosts. > All it is for is quorum. So, looking at fencing next. The primary > server is a poweredge R905, which has DRAC for fencing. The backup > storage node is a Supermicro X9-SCL-F (with IPMI). So I would be using > the DRAC agent for the former and the ipmilan for the latter? I was > reading about location constraints, where you tell each instance of the > fencing agent not to run on the node that would be getting fenced. So, > my first thought was to configure the drac agent and tell it not to > fence node 1, and configure the ipmilan agent and tell it not to fence > node 2. The thing is, there is no agent available for the quorum node. > Would it make more sense instead to tell the drac agent to only run on > node 2, and the ipmilan agent to only run on node 1? Thanks! This is a common mistake. Fencing and quorum solve different problems and are not interchangeable. In short; Fencing is a tool when things go wrong. Quorum is a tool when things are working. The only impact that having quorum has with regard to fencing is that it avoids a scenario when both nodes try to fence each other and the faster one wins (which is itself OK). Even then, you can add 'delay=15' the node you want to win and it will win is such a case. In the old days, it would also prevent a fence loop if you started the cluster on boot and comms were down. Now though, you set 'wait_for_all' and you won't get a fence loop, so that solves that. Said another way; Quorum is optional, fencing is not (people often get that backwards). As for DRAC vs IPMI, no, they are not two things. In fact, I am pretty certain that fence_drac is a symlink to fence_ipmilan. All DRAC is (same with iRMC, iLO, RSA, etc) is "IPMI + features". Fundamentally, the fence action; rebooting the node, works via the basic IPMI standard using the DRAC's BMC. To do proper redundant fencing, which is a great idea, you want something like switched PDUs. This is how we do it (with two node clusters). IPMI first, and if that fails, a pair of PDUs (one for each PSU, each PDU going to independent UPSes) as backup. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
On 2016-08-04 19:03, Digimer wrote: On 04/08/16 06:56 PM, Dan Swartzendruber wrote: I'm setting up an HA NFS server to serve up storage to a couple of vsphere hosts. I have a virtual IP, and it depends on a ZFS resource agent which imports or exports a pool. So far, with stonith disabled, it all works perfectly. I was dubious about a 2-node solution, so I created a 3rd node which runs as a virtual machine on one of the hosts. All it is for is quorum. So, looking at fencing next. The primary server is a poweredge R905, which has DRAC for fencing. The backup storage node is a Supermicro X9-SCL-F (with IPMI). So I would be using the DRAC agent for the former and the ipmilan for the latter? I was reading about location constraints, where you tell each instance of the fencing agent not to run on the node that would be getting fenced. So, my first thought was to configure the drac agent and tell it not to fence node 1, and configure the ipmilan agent and tell it not to fence node 2. The thing is, there is no agent available for the quorum node. Would it make more sense instead to tell the drac agent to only run on node 2, and the ipmilan agent to only run on node 1? Thanks! This is a common mistake. Fencing and quorum solve different problems and are not interchangeable. In short; Fencing is a tool when things go wrong. Quorum is a tool when things are working. The only impact that having quorum has with regard to fencing is that it avoids a scenario when both nodes try to fence each other and the faster one wins (which is itself OK). Even then, you can add 'delay=15' the node you want to win and it will win is such a case. In the old days, it would also prevent a fence loop if you started the cluster on boot and comms were down. Now though, you set 'wait_for_all' and you won't get a fence loop, so that solves that. Said another way; Quorum is optional, fencing is not (people often get that backwards). As for DRAC vs IPMI, no, they are not two things. In fact, I am pretty certain that fence_drac is a symlink to fence_ipmilan. All DRAC is (same with iRMC, iLO, RSA, etc) is "IPMI + features". Fundamentally, the fence action; rebooting the node, works via the basic IPMI standard using the DRAC's BMC. To do proper redundant fencing, which is a great idea, you want something like switched PDUs. This is how we do it (with two node clusters). IPMI first, and if that fails, a pair of PDUs (one for each PSU, each PDU going to independent UPSes) as backup. Thanks for the quick response. I didn't mean to give the impression that I didn't know the different between quorum and fencing. The only reason I (currently) have the quorum node was to prevent a deathmatch (which I had read about elsewhere.) If it is as simple as adding a delay as you describe, I'm inclined to go that route. At least on CentOS7, fence_ipmilan and fence_drac are not the same. e.g. they are both python scripts that are totally different. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
On 04/08/16 07:21 PM, Dan Swartzendruber wrote: > On 2016-08-04 19:03, Digimer wrote: >> On 04/08/16 06:56 PM, Dan Swartzendruber wrote: >>> I'm setting up an HA NFS server to serve up storage to a couple of >>> vsphere hosts. I have a virtual IP, and it depends on a ZFS resource >>> agent which imports or exports a pool. So far, with stonith disabled, >>> it all works perfectly. I was dubious about a 2-node solution, so I >>> created a 3rd node which runs as a virtual machine on one of the hosts. >>> All it is for is quorum. So, looking at fencing next. The primary >>> server is a poweredge R905, which has DRAC for fencing. The backup >>> storage node is a Supermicro X9-SCL-F (with IPMI). So I would be using >>> the DRAC agent for the former and the ipmilan for the latter? I was >>> reading about location constraints, where you tell each instance of the >>> fencing agent not to run on the node that would be getting fenced. So, >>> my first thought was to configure the drac agent and tell it not to >>> fence node 1, and configure the ipmilan agent and tell it not to fence >>> node 2. The thing is, there is no agent available for the quorum node. >>> Would it make more sense instead to tell the drac agent to only run on >>> node 2, and the ipmilan agent to only run on node 1? Thanks! >> >> This is a common mistake. >> >> Fencing and quorum solve different problems and are not interchangeable. >> >> In short; >> >> Fencing is a tool when things go wrong. >> >> Quorum is a tool when things are working. >> >> The only impact that having quorum has with regard to fencing is that it >> avoids a scenario when both nodes try to fence each other and the faster >> one wins (which is itself OK). Even then, you can add 'delay=15' the >> node you want to win and it will win is such a case. In the old days, it >> would also prevent a fence loop if you started the cluster on boot and >> comms were down. Now though, you set 'wait_for_all' and you won't get a >> fence loop, so that solves that. >> >> Said another way; Quorum is optional, fencing is not (people often get >> that backwards). >> >> As for DRAC vs IPMI, no, they are not two things. In fact, I am pretty >> certain that fence_drac is a symlink to fence_ipmilan. All DRAC is (same >> with iRMC, iLO, RSA, etc) is "IPMI + features". Fundamentally, the fence >> action; rebooting the node, works via the basic IPMI standard using the >> DRAC's BMC. >> >> To do proper redundant fencing, which is a great idea, you want >> something like switched PDUs. This is how we do it (with two node >> clusters). IPMI first, and if that fails, a pair of PDUs (one for each >> PSU, each PDU going to independent UPSes) as backup. > > Thanks for the quick response. I didn't mean to give the impression > that I didn't know the different between quorum and fencing. The only > reason I (currently) have the quorum node was to prevent a deathmatch > (which I had read about elsewhere.) If it is as simple as adding a > delay as you describe, I'm inclined to go that route. At least on > CentOS7, fence_ipmilan and fence_drac are not the same. e.g. they are > both python scripts that are totally different. The delay is perfectly fine. We've shipped dozens of two-node systems over the last five or so years and all were 2-node and none have had trouble. Where node failures have occurred, fencing operated properly and services were recovered. So in my opinion, in the interest of minimizing complexity, I recommend the two-node approach. As for the two agents not being symlinked, OK. It still doesn't change the core point through that both fence_ipmilan and fence_drac would be acting on the same target. Note; If you lose power to the mainboard (which we've seen, failed mainboard voltage regulator did this once), you lose the IPMI (DRAC) BMC. This scenario will leave your cluster blocked without an external secondary fence method, like switched PDUs. cheers -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
On 2016-08-04 19:33, Digimer wrote: On 04/08/16 07:21 PM, Dan Swartzendruber wrote: On 2016-08-04 19:03, Digimer wrote: On 04/08/16 06:56 PM, Dan Swartzendruber wrote: I'm setting up an HA NFS server to serve up storage to a couple of vsphere hosts. I have a virtual IP, and it depends on a ZFS resource agent which imports or exports a pool. So far, with stonith disabled, it all works perfectly. I was dubious about a 2-node solution, so I created a 3rd node which runs as a virtual machine on one of the hosts. All it is for is quorum. So, looking at fencing next. The primary server is a poweredge R905, which has DRAC for fencing. The backup storage node is a Supermicro X9-SCL-F (with IPMI). So I would be using the DRAC agent for the former and the ipmilan for the latter? I was reading about location constraints, where you tell each instance of the fencing agent not to run on the node that would be getting fenced. So, my first thought was to configure the drac agent and tell it not to fence node 1, and configure the ipmilan agent and tell it not to fence node 2. The thing is, there is no agent available for the quorum node. Would it make more sense instead to tell the drac agent to only run on node 2, and the ipmilan agent to only run on node 1? Thanks! This is a common mistake. Fencing and quorum solve different problems and are not interchangeable. In short; Fencing is a tool when things go wrong. Quorum is a tool when things are working. The only impact that having quorum has with regard to fencing is that it avoids a scenario when both nodes try to fence each other and the faster one wins (which is itself OK). Even then, you can add 'delay=15' the node you want to win and it will win is such a case. In the old days, it would also prevent a fence loop if you started the cluster on boot and comms were down. Now though, you set 'wait_for_all' and you won't get a fence loop, so that solves that. Said another way; Quorum is optional, fencing is not (people often get that backwards). As for DRAC vs IPMI, no, they are not two things. In fact, I am pretty certain that fence_drac is a symlink to fence_ipmilan. All DRAC is (same with iRMC, iLO, RSA, etc) is "IPMI + features". Fundamentally, the fence action; rebooting the node, works via the basic IPMI standard using the DRAC's BMC. To do proper redundant fencing, which is a great idea, you want something like switched PDUs. This is how we do it (with two node clusters). IPMI first, and if that fails, a pair of PDUs (one for each PSU, each PDU going to independent UPSes) as backup. Thanks for the quick response. I didn't mean to give the impression that I didn't know the different between quorum and fencing. The only reason I (currently) have the quorum node was to prevent a deathmatch (which I had read about elsewhere.) If it is as simple as adding a delay as you describe, I'm inclined to go that route. At least on CentOS7, fence_ipmilan and fence_drac are not the same. e.g. they are both python scripts that are totally different. The delay is perfectly fine. We've shipped dozens of two-node systems over the last five or so years and all were 2-node and none have had trouble. Where node failures have occurred, fencing operated properly and services were recovered. So in my opinion, in the interest of minimizing complexity, I recommend the two-node approach. As for the two agents not being symlinked, OK. It still doesn't change the core point through that both fence_ipmilan and fence_drac would be acting on the same target. Note; If you lose power to the mainboard (which we've seen, failed mainboard voltage regulator did this once), you lose the IPMI (DRAC) BMC. This scenario will leave your cluster blocked without an external secondary fence method, like switched PDUs. cheers Thanks! ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
05.08.2016 02:33, Digimer пишет: > On 04/08/16 07:21 PM, Dan Swartzendruber wrote: >> On 2016-08-04 19:03, Digimer wrote: >>> On 04/08/16 06:56 PM, Dan Swartzendruber wrote: I'm setting up an HA NFS server to serve up storage to a couple of vsphere hosts. I have a virtual IP, and it depends on a ZFS resource agent which imports or exports a pool. ... > > Note; If you lose power to the mainboard (which we've seen, failed > mainboard voltage regulator did this once), you lose the IPMI (DRAC) > BMC. This scenario will leave your cluster blocked without an external > secondary fence method, like switched PDUs. > As in this case there is shared storage (at least, so I understood), using persistent SCSI reservations or SBD as secondary channel can be considered. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
On 04/08/16 11:44 PM, Andrei Borzenkov wrote: > 05.08.2016 02:33, Digimer пишет: >> On 04/08/16 07:21 PM, Dan Swartzendruber wrote: >>> On 2016-08-04 19:03, Digimer wrote: On 04/08/16 06:56 PM, Dan Swartzendruber wrote: > I'm setting up an HA NFS server to serve up storage to a couple of > vsphere hosts. I have a virtual IP, and it depends on a ZFS resource > agent which imports or exports a pool. > > ... > >> >> Note; If you lose power to the mainboard (which we've seen, failed >> mainboard voltage regulator did this once), you lose the IPMI (DRAC) >> BMC. This scenario will leave your cluster blocked without an external >> secondary fence method, like switched PDUs. >> > > As in this case there is shared storage (at least, so I understood), > using persistent SCSI reservations or SBD as secondary channel can be > considered. Yup. That would be fabric fencing though, or are you talking about using it under watchdog timers? If fabric, then my worry is always a panic'ed admin clearing it without properly verifying the state of the lost node. With watchdog, it's fine, just slow. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
On Fri, Aug 5, 2016 at 7:08 AM, Digimer wrote: > On 04/08/16 11:44 PM, Andrei Borzenkov wrote: >> 05.08.2016 02:33, Digimer пишет: >>> On 04/08/16 07:21 PM, Dan Swartzendruber wrote: On 2016-08-04 19:03, Digimer wrote: > On 04/08/16 06:56 PM, Dan Swartzendruber wrote: >> I'm setting up an HA NFS server to serve up storage to a couple of >> vsphere hosts. I have a virtual IP, and it depends on a ZFS resource >> agent which imports or exports a pool. >> >> ... >> >>> >>> Note; If you lose power to the mainboard (which we've seen, failed >>> mainboard voltage regulator did this once), you lose the IPMI (DRAC) >>> BMC. This scenario will leave your cluster blocked without an external >>> secondary fence method, like switched PDUs. >>> >> >> As in this case there is shared storage (at least, so I understood), >> using persistent SCSI reservations or SBD as secondary channel can be >> considered. > > Yup. That would be fabric fencing though, or are you talking about using > it under watchdog timers? fabric is the third possibility :) No, I rather mean something like fence_scsi. Although the practical problem of both fabric or scsi fencing is that it only prevents concurrent access to shared storage; it does not guarantee that other resources are also cleaned up, so may end up with duplicated IP or similar. > If fabric, then my worry is always a panic'ed > admin clearing it without properly verifying the state of the lost node. > With watchdog, it's fine, just slow. > As it is last resort better slow than never. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
A lot of good suggestions here. Unfortunately, my budget is tapped out for the near future at least (this is a home lab/soho setup). I'm inclined to go with Digimer's two-node approach, with IPMI fencing. I understand mobos can die and such. In such a long-shot, manual intervention is fine. So, when I get a chance, I need to remove the quorum node from the cluster and switch it to two_node mode. Thanks for the info! ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
Okay, I almost have this all working. fence_ipmilan for the supermicro host. Had to specify lanplus for it to work. fence_drac5 for the R905. That was failing to complete due to timeout. Found a couple of helpful posts that recommended increase the retry count to 3 and the timeout to 60. That worked also. The only problem now, is that it takes well over a minute to complete the fencing operation. In that interim, the fenced host shows as UNCLEAN (offline), and because the fencing operation hasn't completed, the other node has to wait to import the pool and share out the filesystem. This causes the vsphere hosts to declare the NFS datastore down. I hadn't gotten exact timing, but I think the fencing operation took a little over a minute. I'm wondering if I could change the timeout to a smaller value, but increase the retries? Like back to the default 20 second timeout, but change retries from 1 to 5? ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
On 06/08/16 07:33 PM, Dan Swartzendruber wrote: > > Okay, I almost have this all working. fence_ipmilan for the supermicro > host. Had to specify lanplus for it to work. fence_drac5 for the R905. > That was failing to complete due to timeout. Found a couple of helpful > posts that recommended increase the retry count to 3 and the timeout to > 60. That worked also. The only problem now, is that it takes well over > a minute to complete the fencing operation. In that interim, the fenced > host shows as UNCLEAN (offline), and because the fencing operation > hasn't completed, the other node has to wait to import the pool and > share out the filesystem. This causes the vsphere hosts to declare the > NFS datastore down. I hadn't gotten exact timing, but I think the > fencing operation took a little over a minute. I'm wondering if I could > change the timeout to a smaller value, but increase the retries? Like > back to the default 20 second timeout, but change retries from 1 to 5? Did you try the fence_ipmilan against the DRAC? It *should* work. Would be interesting to see if it had the same issue. Can you check the DRAC's host's power state using ipmitool directly without delay? -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
On 2016-08-06 19:46, Digimer wrote: On 06/08/16 07:33 PM, Dan Swartzendruber wrote: Okay, I almost have this all working. fence_ipmilan for the supermicro host. Had to specify lanplus for it to work. fence_drac5 for the R905. That was failing to complete due to timeout. Found a couple of helpful posts that recommended increase the retry count to 3 and the timeout to 60. That worked also. The only problem now, is that it takes well over a minute to complete the fencing operation. In that interim, the fenced host shows as UNCLEAN (offline), and because the fencing operation hasn't completed, the other node has to wait to import the pool and share out the filesystem. This causes the vsphere hosts to declare the NFS datastore down. I hadn't gotten exact timing, but I think the fencing operation took a little over a minute. I'm wondering if I could change the timeout to a smaller value, but increase the retries? Like back to the default 20 second timeout, but change retries from 1 to 5? Did you try the fence_ipmilan against the DRAC? It *should* work. Would be interesting to see if it had the same issue. Can you check the DRAC's host's power state using ipmitool directly without delay? Yes, I did try fence_ipmilan, but it got the timeout waiting for power off (or whatever). I have to admit, I switched to fence_drac and had the same issue, but after increasing the timeout and retries, got it to work, so it is possible, that fence_ipmilan is okay. They both seemed to take more than 60 seconds to complete the operation. I have to say that when I do a power cycle through the drac web interface, it takes awhile, so that might be normal. I think I will try again with 20 seconds and 5 retries and see how that goes... ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
On 06/08/16 08:22 PM, Dan Swartzendruber wrote: > On 2016-08-06 19:46, Digimer wrote: >> On 06/08/16 07:33 PM, Dan Swartzendruber wrote: >>> >>> Okay, I almost have this all working. fence_ipmilan for the supermicro >>> host. Had to specify lanplus for it to work. fence_drac5 for the R905. >>> That was failing to complete due to timeout. Found a couple of helpful >>> posts that recommended increase the retry count to 3 and the timeout to >>> 60. That worked also. The only problem now, is that it takes well over >>> a minute to complete the fencing operation. In that interim, the fenced >>> host shows as UNCLEAN (offline), and because the fencing operation >>> hasn't completed, the other node has to wait to import the pool and >>> share out the filesystem. This causes the vsphere hosts to declare the >>> NFS datastore down. I hadn't gotten exact timing, but I think the >>> fencing operation took a little over a minute. I'm wondering if I could >>> change the timeout to a smaller value, but increase the retries? Like >>> back to the default 20 second timeout, but change retries from 1 to 5? >> >> Did you try the fence_ipmilan against the DRAC? It *should* work. Would >> be interesting to see if it had the same issue. Can you check the DRAC's >> host's power state using ipmitool directly without delay? > > Yes, I did try fence_ipmilan, but it got the timeout waiting for power > off (or whatever). I have to admit, I switched to fence_drac and had > the same issue, but after increasing the timeout and retries, got it to > work, so it is possible, that fence_ipmilan is okay. They both seemed > to take more than 60 seconds to complete the operation. I have to say > that when I do a power cycle through the drac web interface, it takes > awhile, so that might be normal. I think I will try again with 20 > seconds and 5 retries and see how that goes... What about using ipmitool directly? I can't imagine that such a long time is normal. Maybe there is a firmware update for the DRAC and/or BIOS? (I know with Fujitsu, they recommend updating the IPMI BMC and BIOS together). Over a minute to fence is, strictly speaking, OK. However, that's a significant delay in time to recover. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
On 2016-08-06 21:59, Digimer wrote: On 06/08/16 08:22 PM, Dan Swartzendruber wrote: On 2016-08-06 19:46, Digimer wrote: On 06/08/16 07:33 PM, Dan Swartzendruber wrote: (snip) What about using ipmitool directly? I can't imagine that such a long time is normal. Maybe there is a firmware update for the DRAC and/or BIOS? (I know with Fujitsu, they recommend updating the IPMI BMC and BIOS together). Unfortunately, the R905 is EoL, so any updates are not likely. Over a minute to fence is, strictly speaking, OK. However, that's a significant delay in time to recover. The thing that concerns me, though, is the delay in I/O for vsphere clients. I know 2 or more retries of 60 seconds caused issues. I'm going to try again with 5 20-second retries, and see how that works. If this doesn't cooperate, I may need to look into an PDU or something... ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
On 2016-08-06 21:59, Digimer wrote: On 06/08/16 08:22 PM, Dan Swartzendruber wrote: On 2016-08-06 19:46, Digimer wrote: On 06/08/16 07:33 PM, Dan Swartzendruber wrote: (snip) What about using ipmitool directly? I can't imagine that such a long time is normal. Maybe there is a firmware update for the DRAC and/or BIOS? (I know with Fujitsu, they recommend updating the IPMI BMC and BIOS together). Over a minute to fence is, strictly speaking, OK. However, that's a significant delay in time to recover. Okay, I tested with 20 second timeout and 5 retries, using fence_drac5 at the command line. Ran 'date' on both sides to see how long it took. Just under a minute. It's too late now to mess around any more for tonight. I do need to verify that that works okay for vsphere. I will post back my results. Thanks! ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
> On 04/08/16 07:21 PM, Dan Swartzendruber wrote: >> On 2016-08-04 19:03, Digimer wrote: >>> As for DRAC vs IPMI, no, they are not two things. In fact, I am pretty >>> certain that fence_drac is a symlink to fence_ipmilan. All DRAC is (same >>> with iRMC, iLO, RSA, etc) is "IPMI + features". Fundamentally, the fence >>> action; rebooting the node, works via the basic IPMI standard using the >>> DRAC's BMC. >>> >>> [...] >> >> At least on CentOS7, fence_ipmilan and fence_drac are not the same. >> e.g. they are both python scripts that are totally different. > > [...] > > As for the two agents not being symlinked, OK. It still doesn't change > the core point through that both fence_ipmilan and fence_drac would be > acting on the same target. Just thought I'd add some clarifications: - in fact fence-agents upstream seems to have thrown the idea of proper symlinks away before functionality to that effect was added, eventually using file copies instead of symlinks, with the rationale "this approach is not recommended so they regular files" [Marx&Oyvind, I cannot really imagine what issues this was meant to solve nor why it would be not recommended (in Pacemaker, stat calls are used that work with symlink targets, not the immediate link files, ditto other standard file handling functions), but it seems pretty non-systemic compared to, e.g., fence_xvm -> fence_virt: https://github.com/ClusterLabs/fence-virt/blob/f1f1a2437c5b0811269b5859a5ef646f44105a88/client/Makefile.in#L39 and this also makes resulting packages inflated with redundant scripts + man pages needlessly; I'd make a PR for that but it seems premature until the recursive make/install issue with "symlinked" agents has a definitive conclusion (PR 81+82), but basically, you just want 'ln -s SRC DST' instead of 'cp SRC DST'] - fence_ipmilan and fence_drac are indeed not even virtually symlinked; quick and dirty way to receive this information, see https://bugzilla.redhat.com/show_bug.cgi?id=1210679#c12 (you may need ' | tr -s " "' just after 'ls -l' command) from where you can see that it is fence_idrac which is a virtual symlink (same implementation) as fence_ipmilan, while fence_drac is an agent on its own Hope this helps. -- Jan (Poki) pgpF127a0x1Kd.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster
On 15/08/16 14:48 +0200, Jan Pokorný wrote: >> On 04/08/16 07:21 PM, Dan Swartzendruber wrote: >>> On 2016-08-04 19:03, Digimer wrote: As for DRAC vs IPMI, no, they are not two things. In fact, I am pretty certain that fence_drac is a symlink to fence_ipmilan. All DRAC is (same with iRMC, iLO, RSA, etc) is "IPMI + features". Fundamentally, the fence action; rebooting the node, works via the basic IPMI standard using the DRAC's BMC. [...] >>> >>> At least on CentOS7, fence_ipmilan and fence_drac are not the same. >>> e.g. they are both python scripts that are totally different. >> >> [...] >> >> As for the two agents not being symlinked, OK. It still doesn't change >> the core point through that both fence_ipmilan and fence_drac would be >> acting on the same target. > > Just thought I'd add some clarifications: > > - in fact fence-agents upstream seems to have thrown the idea of > proper symlinks away before functionality to that effect was added, > eventually using file copies instead of symlinks, with the rationale > "this approach is not recommended so they regular files" Reference needed (accidentally omitted): https://github.com/ClusterLabs/fence-agents/commit/87266bc > [Marx&Oyvind, I cannot really imagine what issues this was meant to > solve nor why it would be not recommended (in Pacemaker, stat calls > are used that work with symlink targets, not the immediate link > files, ditto other standard file handling functions), but it seems > pretty non-systemic compared to, e.g., fence_xvm -> fence_virt: > > https://github.com/ClusterLabs/fence-virt/blob/f1f1a2437c5b0811269b5859a5ef646f44105a88/client/Makefile.in#L39 > and this also makes resulting packages inflated with redundant > scripts + man pages needlessly; I'd make a PR for that but it > seems premature until the recursive make/install issue with > "symlinked" agents has a definitive conclusion (PR 81+82), but > basically, you just want 'ln -s SRC DST' instead of 'cp SRC DST'] > > - fence_ipmilan and fence_drac are indeed not even virtually > symlinked; quick and dirty way to receive this information, see > https://bugzilla.redhat.com/show_bug.cgi?id=1210679#c12 > (you may need ' | tr -s " "' just after 'ls -l' command) > from where you can see that it is fence_idrac which is a virtual > symlink (same implementation) as fence_ipmilan, while fence_drac > is an agent on its own > > > Hope this helps. -- Jan (Poki) pgptRj7oPo5EC.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org