Re: [ClusterLabs] pcs stonith fence - Error: unable to fence

2020-01-20 Thread Ken Gaillot
On Sat, 2020-01-18 at 22:20 +, Strahil Nikolov wrote:
> Sorry for the spam.
> I figured out that I forgot to specify the domain for the 'drbd1' and
> thus it has reacted like that.
> The strange thing is that pcs allows me to fence a node , that is not
> in the cluster :)
> 
> Do you think that this behaviour is a bug?
> If yes, I can open an issue to the upstream
> 
> 
> Best Regards,
> Strahil Nikolov

Leaving pcs out of the picture for a moment, from pacemaker's view the
stonith_admin command is just passing along what the user requested,
and the fencing daemon determines whether it's a valid request or not
and fails the request appropriately. So technically it's not a bug.

However I see two possible areas of improvement:

- The status display should show not just that the request failed, but
why. There is a project already planned to show why fencing was
initiated, so this would be a good addition to that. It's just a matter
of having developer time to do it.

- Since pcs is at a higher level than stonith_admin, it could
require "--force" if a given node isn't in the cluster configuration.
Feel free to file an upstream request for that.


> В неделя, 19 януари 2020 г., 00:01:11 ч. Гринуич+2, Strahil Nikolov <
> hunter86...@yahoo.com> написа: 
> 
> 
> 
> 
> 
> Hi All,
> 
> 
> I am building a test cluster with fence_rhevm stonith agent on RHEL
> 7.7 and oVirt 4.3.
> When I fenced drbd3 from drbd1 using 'pcs stonith fence drbd3' - the
> fence action was successfull.
> 
> So then I decided to test the fencing the opposite way and it
> partially failed.
> 
> 
> 1. in oVirt the machine was powered off and then powered on properly
> - so the communication with the engine is OK
> 2. the command on drbd3 to fence drbd1 did stuck and then reported as
> failiure despite the VM was reset.
> 
> 
> 
> Now 'pcs status' is reporting the following:
> Failed Fencing Actions:
> * reboot of drbd1 failed: delegate=drbd3.localdomain,
> client=stonith_admin.1706, origin=drbd3.localdomain,
>last-failed='Sat Jan 18 23:18:24 2020'
> 
> 
> 
> 
> My stonith is configured as follows:
> Stonith Devices: 
> Resource: ovirt_FENCE (class=stonith type=fence_rhevm) 
>  Attributes: ipaddr=engine.localdomain login=fencerdrbd@internal
> passwd=I_have_replaced_that 
> pcmk_host_map=drbd1.localdomain:drbd1;drbd2.localdomain:drbd2;drbd3.localdomain:drbd
> 3 power_wait=3 ssl=1 ssl_secure=1 
>  Operations: monitor interval=60s (ovirt_FENCE-monitor-interval-60s) 
> Fencing Levels:
> 
> 
> 
> Do I need to add some other settings to the fence_rhevm stonith agent
> ?
> 
> 
> Manually running the status command from drbd2/drbd3 is OK:
> 
> 
> [root@drbd3 ~]# fence_rhevm -o status --ssl --ssl-secure -a
> engine.localdomain --username='fencerdrbd@internal'  
> --password=I_have_replaced_that -n drbd1 
> Status: ON
> 
> I'm attaching the logs from the drbd2 (DC) and drbd3.
> 
> 
> Thanks in advance for your suggestions.
> 
> 
> Best Regards,
> Strahil Nikolov
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] pcs stonith fence - Error: unable to fence

2020-01-18 Thread Strahil Nikolov
Sorry for the spam.
I figured out that I forgot to specify the domain for the 'drbd1' and thus it 
has reacted like that.
The strange thing is that pcs allows me to fence a node , that is not in the 
cluster :)

Do you think that this behaviour is a bug?
If yes, I can open an issue to the upstream


Best Regards,
Strahil Nikolov



В неделя, 19 януари 2020 г., 00:01:11 ч. Гринуич+2, Strahil Nikolov 
 написа: 





Hi All,


I am building a test cluster with fence_rhevm stonith agent on RHEL 7.7 and 
oVirt 4.3.
When I fenced drbd3 from drbd1 using 'pcs stonith fence drbd3' - the fence 
action was successfull.

So then I decided to test the fencing the opposite way and it partially failed.


1. in oVirt the machine was powered off and then powered on properly - so the 
communication with the engine is OK
2. the command on drbd3 to fence drbd1 did stuck and then reported as failiure 
despite the VM was reset.



Now 'pcs status' is reporting the following:
Failed Fencing Actions:
* reboot of drbd1 failed: delegate=drbd3.localdomain, 
client=stonith_admin.1706, origin=drbd3.localdomain,
   last-failed='Sat Jan 18 23:18:24 2020'




My stonith is configured as follows:
Stonith Devices: 
Resource: ovirt_FENCE (class=stonith type=fence_rhevm) 
 Attributes: ipaddr=engine.localdomain login=fencerdrbd@internal 
passwd=I_have_replaced_that 
pcmk_host_map=drbd1.localdomain:drbd1;drbd2.localdomain:drbd2;drbd3.localdomain:drbd
3 power_wait=3 ssl=1 ssl_secure=1 
 Operations: monitor interval=60s (ovirt_FENCE-monitor-interval-60s) 
Fencing Levels:



Do I need to add some other settings to the fence_rhevm stonith agent ?


Manually running the status command from drbd2/drbd3 is OK:


[root@drbd3 ~]# fence_rhevm -o status --ssl --ssl-secure -a engine.localdomain 
--username='fencerdrbd@internal'  --password=I_have_replaced_that -n drbd1 
Status: ON

I'm attaching the logs from the drbd2 (DC) and drbd3.


Thanks in advance for your suggestions.


Best Regards,
Strahil Nikolov
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/