Re: [ClusterLabs] epic fail

2017-07-24 Thread Dimitri Maziuk
On 07/24/2017 11:34 AM, Ken Gaillot wrote: > On Mon, 2017-07-24 at 18:09 +0200, Valentin Vidic wrote: >> On Mon, Jul 24, 2017 at 11:01:26AM -0500, Dimitri Maziuk wrote: >>> Lsof/fuser show the PID of the process holding FS open as "kernel". >> >> That could be the NFS server running in the kernel.

Re: [ClusterLabs] epic fail

2017-07-24 Thread Ken Gaillot
On Mon, 2017-07-24 at 18:09 +0200, Valentin Vidic wrote: > On Mon, Jul 24, 2017 at 11:01:26AM -0500, Dimitri Maziuk wrote: > > Lsof/fuser show the PID of the process holding FS open as "kernel". > > That could be the NFS server running in the kernel. Dimitri, Is the NFS server also managed by

Re: [ClusterLabs] epic fail

2017-07-24 Thread Valentin Vidic
On Mon, Jul 24, 2017 at 10:38:40AM -0500, Ken Gaillot wrote: > Standby is not necessary, it's just a cautious step that allows the > admin to verify that all resources moved off correctly. The restart that > yum does should be sufficient for pacemaker to move everything. > > A restart shouldn't

Re: [ClusterLabs] epic fail

2017-07-24 Thread Dimitri Maziuk
> Jul 22 14:03:46 zebrafish nfsserver(server_nfs)[6614]: INFO: Stopping NFS > server ... > Jul 22 14:03:46 zebrafish systemd: Stopping NFS server and services... > Jul 22 14:03:46 zebrafish systemd: Stopped NFS server and services. > Jul 22 14:03:46 zebrafish systemd: Stopping NFS Mount Daemon...

Re: [ClusterLabs] epic fail

2017-07-24 Thread Kristián Feldsam
nfs server/share is also managed by pacemaker and orderis set right? S pozdravem Kristián Feldsam Tel.: +420 773 303 353, +421 944 137 535 E-mail.: supp...@feldhost.cz www.feldhost.cz - FeldHost™ – profesionální hostingové a serverové služby za adekvátní ceny. FELDSAM s.r.o. V rohu 434/3 Praha

Re: [ClusterLabs] epic fail

2017-07-24 Thread Valentin Vidic
On Mon, Jul 24, 2017 at 11:01:26AM -0500, Dimitri Maziuk wrote: > Lsof/fuser show the PID of the process holding FS open as "kernel". That could be the NFS server running in the kernel. -- Valentin ___ Users mailing list: Users@clusterlabs.org

Re: [ClusterLabs] epic fail

2017-07-24 Thread Dimitri Maziuk
On 07/24/2017 10:38 AM, Ken Gaillot wrote: > A restart shouldn't lead to fencing in any case where something's not > going seriously wrong. I'm not familiar with the "kernel is using it" > message, I haven't run into that before. I posted it at least once before. > > Jul 22 14:03:48 zebrafish

Re: [ClusterLabs] epic fail

2017-07-24 Thread Ken Gaillot
On Mon, 2017-07-24 at 17:13 +0200, Kristián Feldsam wrote: > Hmm, so when you know, that it happens also when putting node standy, > them why you run yum update on live cluster, it must be clear that > node will be fenced. Standby is not necessary, it's just a cautious step that allows the admin

Re: [ClusterLabs] epic fail

2017-07-24 Thread Kristián Feldsam
Hmm, so when you know, that it happens also when putting node standy, them why you run yum update on live cluster, it must be clear that node will be fenced. Would you post your pacemaker config? + some logs? S pozdravem Kristián Feldsam Tel.: +420 773 303 353, +421 944 137 535 E-mail.:

Re: [ClusterLabs] epic fail

2017-07-24 Thread Dimitri Maziuk
On 07/24/2017 09:40 AM, Jan Pokorný wrote: > Would there be an interest, though? And would that be meaningful? IMO the only reason to put a node in standby is if you want to reboot the active node with no service interruption. For anything else, including a reboot with service interruption

Re: [ClusterLabs] epic fail

2017-07-24 Thread Jan Pokorný
On 23/07/17 14:40 +0200, Valentin Vidic wrote: > On Sun, Jul 23, 2017 at 07:27:03AM -0500, Dmitri Maziuk wrote: >> So yesterday I ran yum update that puled in the new pacemaker and tried to >> restart it. The node went into its usual "can't unmount drbd because kernel >> is using it" and got

Re: [ClusterLabs] epic fail

2017-07-23 Thread Digimer
On 2017-07-23 08:27 AM, Dmitri Maziuk wrote: > So yesterday I ran yum update that puled in the new pacemaker and tried > to restart it. The node went into its usual "can't unmount drbd because > kernel is using it" and got stonith'ed in the middle of yum transaction. > The end result: DRBD reports

Re: [ClusterLabs] epic fail

2017-07-23 Thread Kristián Feldsam
You can not update running cluster! First you need put node standby, check if all resources stopped and them do what you need. This was unfortunately your fail :( S pozdravem Kristián Feldsam Tel.: +420 773 303 353, +421 944 137 535 E-mail.: supp...@feldhost.cz www.feldhost.cz - FeldHost™ –

Re: [ClusterLabs] epic fail

2017-07-23 Thread Valentin Vidic
On Sun, Jul 23, 2017 at 07:27:03AM -0500, Dmitri Maziuk wrote: > So yesterday I ran yum update that puled in the new pacemaker and tried to > restart it. The node went into its usual "can't unmount drbd because kernel > is using it" and got stonith'ed in the middle of yum transaction. The end >

[ClusterLabs] epic fail

2017-07-23 Thread Dmitri Maziuk
So yesterday I ran yum update that puled in the new pacemaker and tried to restart it. The node went into its usual "can't unmount drbd because kernel is using it" and got stonith'ed in the middle of yum transaction. The end result: DRBD reports split brain, HA daemons don't start on boot, RPM