Re: [ClusterLabs] why is node fenced ?

2020-08-14 Thread Lentes, Bernd
- On Aug 9, 2020, at 10:17 PM, Bernd Lentes 
bernd.len...@helmholtz-muenchen.de wrote:


>> So this appears to be the problem. From these logs I would guess the
>> successful stop on ha-idg-1 did not get written to the CIB for some
>> reason. I'd look at the pe input from this transition on ha-idg-2 to
>> confirm that.
>> 
>> Without the DC knowing about the stop, it tries to schedule a new one,
>> but the node is shutting down so it can't do it, which means it has to
>> be fenced.

I checked all relevant pe-files in this time period.
This is what i found out (i just write the important entries):

ha-idg-1:~/why-fenced/ha-idg-1/pengine # crm_simulate -S -x pe-input-3116 -G 
transition-3116.xml -D transition-3116.dot
Current cluster status:
 ...
 vm_nextcloud   (ocf::heartbeat:VirtualDomain): Started ha-idg-1
Transition Summary:
 ...
* Migratevm_nextcloud   ( ha-idg-1 -> ha-idg-2 )
Executing cluster transition:
 * Resource action: vm_nextcloudmigrate_from on ha-idg-2 <=== migrate 
vm_nextcloud
 * Resource action: vm_nextcloudstop on ha-idg-1 
 * Pseudo action:   vm_nextcloud_start_0
Revised cluster status:
Node ha-idg-1 (1084777482): standby
Online: [ ha-idg-2 ]
vm_nextcloud   (ocf::heartbeat:VirtualDomain): Started ha-idg-2


ha-idg-1:~/why-fenced/ha-idg-1/pengine # crm_simulate -S -x pe-error-48 -G 
transition-4514.xml -D transition-4514.dot
Current cluster status:
Node ha-idg-1 (1084777482): standby
Online: [ ha-idg-2 ]
...
 vm_nextcloud   (ocf::heartbeat:VirtualDomain): FAILED[ ha-idg-2 ha-idg-1 ] 
<== migration failed
Transition Summary:
..
 * Recovervm_nextcloud( ha-idg-2 )
Executing cluster transition:
 * Resource action: vm_nextcloudstop on ha-idg-2
 * Resource action: vm_nextcloudstop on ha-idg-1 
 * Resource action: vm_nextcloudstart on ha-idg-2
 * Resource action: vm_nextcloudmonitor=3 on ha-idg-2
Revised cluster status:
 vm_nextcloud   (ocf::heartbeat:VirtualDomain): Started ha-idg-2

ha-idg-1:~/why-fenced/ha-idg-1/pengine # crm_simulate -S -x pe-input-3117 -G 
transition-3117.xml -D transition-3117.dot
Current cluster status:
Node ha-idg-1 (1084777482): standby
Online: [ ha-idg-2 ]
 vm_nextcloud   (ocf::heartbeat:VirtualDomain): FAILED ha-idg-2 <== start 
on ha-idg-2 failed
Transition Summary:
 * Stop   vm_nextcloud ( ha-idg-2 )   due to node availability < 
stop vm_nextcloud (what means due to node availability ?)
Executing cluster transition:
 * Resource action: vm_nextcloudstop on ha-idg-2
Revised cluster status:
 vm_nextcloud   (ocf::heartbeat:VirtualDomain): Stopped

ha-idg-1:~/why-fenced/ha-idg-1/pengine # crm_simulate -S -x pe-input-3118 -G 
transition-4516.xml -D transition-4516.dot
Current cluster status:
Node ha-idg-1 (1084777482): standby
Online: [ ha-idg-2 ]
 vm_nextcloud   (ocf::heartbeat:VirtualDomain): Stopped <== 
vm_nextcloud is stopped
Transition Summary:
 * Shutdown ha-idg-1
Executing cluster transition:
 * Resource action: vm_nextcloudstop on ha-idg-1 < why stop ? It is 
already stopped
Revised cluster status:
 vm_nextcloud   (ocf::heartbeat:VirtualDomain): Stopped

ha-idg-1:~/why-fenced/ha-idg-2/pengine # crm_simulate -S -x pe-input-3545 -G 
transition-0.xml -D transition-0.dot
Current cluster status:
Node ha-idg-1 (1084777482): pending
Online: [ ha-idg-2 ]
 vm_nextcloud   (ocf::heartbeat:VirtualDomain): Stopped <== vm_nextcloud is 
stopped
Transition Summary:

Executing cluster transition:
Using the original execution date of: 2020-07-20 15:05:33Z
Revised cluster status:
vm_nextcloud   (ocf::heartbeat:VirtualDomain): Stopped

ha-idg-1:~/why-fenced/ha-idg-2/pengine # crm_simulate -S -x pe-warn-749 -G 
transition-1.xml -D transition-1.dot
Current cluster status:
Node ha-idg-1 (1084777482): OFFLINE (standby)
Online: [ ha-idg-2 ]
 vm_nextcloud   (ocf::heartbeat:VirtualDomain): Stopped <=== vm_nextcloud 
is stopped
Transition Summary:
 * Fence (Off) ha-idg-1 'resource actions are unrunnable'
Executing cluster transition:
 * Fencing ha-idg-1 (Off)
 * Pseudo action:   vm_nextcloud_stop_0 <=== why stop ? It is already 
stopped ?
Revised cluster status:
Node ha-idg-1 (1084777482): OFFLINE (standby)
Online: [ ha-idg-2 ]
 vm_nextcloud   (ocf::heartbeat:VirtualDomain): Stopped

I don't understand why the cluster tries to stop a resource which is already 
stopped.

Bernd
Helmholtz Zentrum München

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir.in Prof. Dr. Veronika von Messling
Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin Guenther
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Stonith failing

2020-08-14 Thread Gabriele Bulfon
Thanks to all your suggestions, I now have the systems with stonith configured 
on ipmi.
 
Two questions:
- how can I simulate a stonith situation to check that everything is ok?
- considering that I have both nodes with stonith against the other node, once 
the two nodes can communicate, how can I be sure the two nodes will not try to 
stonith each other?
 
:)
Thanks!
Gabriele
 
 
Sonicle S.r.l. 
: 
http://www.sonicle.com
Music: 
http://www.gabrielebulfon.com
Quantum Mechanics : 
http://www.cdbaby.com/cd/gabrielebulfon
Da:
Gabriele Bulfon
A:
Cluster Labs - All topics related to open-source clustering welcomed
Data:
29 luglio 2020 14.22.42 CEST
Oggetto:
Re: [ClusterLabs] Antw: [EXT] Stonith failing
 
It is a ZFS based illumos system.
I don't think SBD is an option.
Is there a reliable ZFS based stonith?
 
Gabriele
 
 
Sonicle S.r.l. 
: 
http://www.sonicle.com
Music: 
http://www.gabrielebulfon.com
Quantum Mechanics : 
http://www.cdbaby.com/cd/gabrielebulfon
Da:
Andrei Borzenkov
A:
Cluster Labs - All topics related to open-source clustering welcomed
Data:
29 luglio 2020 9.46.09 CEST
Oggetto:
Re: [ClusterLabs] Antw: [EXT] Stonith failing
 
On Wed, Jul 29, 2020 at 9:01 AM Gabriele Bulfon
gbul...@sonicle.com
wrote:
That one was taken from a specific implementation on Solaris 11.
The situation is a dual node server with shared storage controller: both nodes 
see the same disks concurrently.
Here we must be sure that the two nodes are not going to import/mount the same 
zpool at the same time, or we will encounter data corruption:
 
ssh based "stonith" cannot guarantee it.
 
node 1 will be perferred for pool 1, node 2 for pool 2, only in case one of the 
node goes down or is taken offline the resources should be first free by the 
leaving node and taken by the other node.
 
Would you suggest one of the available stonith in this case?
 
 
IPMI, managed PDU, SBD ...
In practice, the only stonith method that works in case of complete node outage 
including any power supply is SBD.
___Manage your 
subscription:https://lists.clusterlabs.org/mailman/listinfo/usersClusterLabs 
home: https://www.clusterlabs.org/
___Manage your 
subscription:https://lists.clusterlabs.org/mailman/listinfo/usersClusterLabs 
home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] why is node fenced ?

2020-08-14 Thread Lentes, Bernd


- On Aug 10, 2020, at 11:59 PM, kgaillot kgail...@redhat.com wrote:
> The most recent transition is aborted, but since all its actions are
> complete, the only effect is to trigger a new transition.
> 
> We should probably rephrase the log message. In fact, the whole
> "transition" terminology is kind of obscure. It's hard to come up with
> something better though.
> 
Hi Ken,

i don't get it. How can s.th. be aborted which is already completed ?

Bernd
Helmholtz Zentrum München

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir.in Prof. Dr. Veronika von Messling
Geschaeftsfuehrung: Prof. Dr. med. Dr. h.c. Matthias Tschoep, Kerstin Guenther
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/