Re: [ClusterLabs] volume group won't start in a nested DRBD setup

2019-10-29 Thread Jean-Francois Malouin


* Jean-Francois Malouin  [20191029 
09:49]:
> * Roger Zhou  [20191029 06:18]:
> > 
> > On 10/29/19 12:30 PM, Andrei Borzenkov wrote:
> > >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: Activating volume 
> > >> group vg0
> > >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO:  Reading all physical 
> > >> volumes. This may take a while... Found volume group "vmspace" using 
> > >> metadata type lvm2 Found volume group "freespace" using metadata type
> > >>   lvm2 Found volume group "vg0" using metadata type lvm2
> > >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO:  0 logical volume(s) 
> > >> in volume group "vg0" now active
> > > Resource agent really does just "vgchange vg0". Does it work when you
> > > run it manually?
> > > 
> > 
> > Agree with Andrei.
> 
> Yes, it does. 
> > 
> > > 
> > >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM Volume vg0 is not 
> > >> available (stopped)
> > >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM: vg0 did not 
> > >> activate correctly
> > >> Oct 28 14:42:56 node2 pacemaker-execd[27054]:  notice: 
> > >> p_lvm_vg0_start_0:8775:stderr [   Configuration node global/use_lvmetad 
> > >> not found ]
> > 
> > This error indicates the root cause is related to lvmetad. Please check 
> > lvmetad, eg.
> > 
> > systemctl status lvm2-lvmetad
> > grep use_lvmetad /etc/lvm/lvm.conf
> > 
> > Check your lvm2 version and google its workaround/fix accordingly.
> 
> That was my hunch too. This is on Debian/buster and the lvm.conf is the
> original one from the initial install except for the configuration option
> devices/filter that added:
> 
>  filter = [ "a|/dev/drbd.*|", "a|/dev/md.*|", "a|/dev/md/.*|", "r|.*|" ]
>  write_cache_state = 0
> 
> and the configuration option devices/global_filter:
> 
> global_filter = [ "a|/dev/drbd.*|", "a|/dev/md.*|", "a|/dev/md/.*|", 
> "r|.*|" ]
> 
> There was no 'use_lvmetad' option present initially but I have add it in the 
> global section.
> use_lvmetad = 0
> 
> And lvm2-lvmetad is not a unit listed under systemd...
> 
> systemctl status lvm2-lvmetad
> Unit lvm2-lvmetad.service could not be found.

I had a second go at this earlier this morning and it seems that I might have
NOT enabled 'filter' and 'global_filter' at the same time, or with 'use_lvmetad 
= 0' 
too at the same time as right now the volume group vg0 is running correctly and
I can move it around no problem.

I'm reviewing all this as I'm a bit baffled...

thanks for all your input,
jf

> 
> thanks,
> jf
> 
> > 
> > Cheers,
> > Roger
> > 
> > >> Oct 28 14:42:56 node2 pacemaker-execd[27054]:  notice: 
> > >> p_lvm_vg0_start_0:8775:stderr [ ocf-exit-reason:LVM: vg0 did not 
> > >> activate correctly ]
> > >> Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: Result of 
> > >> start operation for p_lvm_vg0 on node2: 7 (not running)
> > >> Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: 
> > >> node2-p_lvm_vg0_start_0:77 [   Configuration node global/use_lvmetad not 
> > >> found\nocf-exit-reason:LVM: vg0 did not activate correctly\n ]
> > >> Oct 28 14:42:56 node2 pacemaker-controld[27057]:  warning: Action 42 
> > >> (p_lvm_vg0_start_0) on node2 failed (target: 0 vs. rc: 7): Error
> > >> Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: Transition 602 
> > >> aborted by operation p_lvm_vg0_start_0 'modify' on node2: Event failed
> > 
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] volume group won't start in a nested DRBD setup

2019-10-29 Thread Jean-Francois Malouin
* Roger Zhou  [20191029 06:18]:
> 
> On 10/29/19 12:30 PM, Andrei Borzenkov wrote:
> >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: Activating volume group 
> >> vg0
> >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO:  Reading all physical 
> >> volumes. This may take a while... Found volume group "vmspace" using 
> >> metadata type lvm2 Found volume group "freespace" using metadata type
> >>   lvm2 Found volume group "vg0" using metadata type lvm2
> >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO:  0 logical volume(s) in 
> >> volume group "vg0" now active
> > Resource agent really does just "vgchange vg0". Does it work when you
> > run it manually?
> > 
> 
> Agree with Andrei.

Yes, it does. 
> 
> > 
> >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM Volume vg0 is not 
> >> available (stopped)
> >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM: vg0 did not 
> >> activate correctly
> >> Oct 28 14:42:56 node2 pacemaker-execd[27054]:  notice: 
> >> p_lvm_vg0_start_0:8775:stderr [   Configuration node global/use_lvmetad 
> >> not found ]
> 
> This error indicates the root cause is related to lvmetad. Please check 
> lvmetad, eg.
> 
> systemctl status lvm2-lvmetad
> grep use_lvmetad /etc/lvm/lvm.conf
> 
> Check your lvm2 version and google its workaround/fix accordingly.

That was my hunch too. This is on Debian/buster and the lvm.conf is the
original one from the initial install except for the configuration option
devices/filter that added:

 filter = [ "a|/dev/drbd.*|", "a|/dev/md.*|", "a|/dev/md/.*|", "r|.*|" ]
 write_cache_state = 0

and the configuration option devices/global_filter:

global_filter = [ "a|/dev/drbd.*|", "a|/dev/md.*|", "a|/dev/md/.*|", 
"r|.*|" ]

There was no 'use_lvmetad' option present initially but I have add it in the 
global section.
use_lvmetad = 0

And lvm2-lvmetad is not a unit listed under systemd...

systemctl status lvm2-lvmetad
Unit lvm2-lvmetad.service could not be found.

thanks,
jf

> 
> Cheers,
> Roger
> 
> >> Oct 28 14:42:56 node2 pacemaker-execd[27054]:  notice: 
> >> p_lvm_vg0_start_0:8775:stderr [ ocf-exit-reason:LVM: vg0 did not activate 
> >> correctly ]
> >> Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: Result of start 
> >> operation for p_lvm_vg0 on node2: 7 (not running)
> >> Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: 
> >> node2-p_lvm_vg0_start_0:77 [   Configuration node global/use_lvmetad not 
> >> found\nocf-exit-reason:LVM: vg0 did not activate correctly\n ]
> >> Oct 28 14:42:56 node2 pacemaker-controld[27057]:  warning: Action 42 
> >> (p_lvm_vg0_start_0) on node2 failed (target: 0 vs. rc: 7): Error
> >> Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: Transition 602 
> >> aborted by operation p_lvm_vg0_start_0 'modify' on node2: Event failed
> 
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] volume group won't start in a nested DRBD setup

2019-10-29 Thread Roger Zhou


On 10/29/19 12:30 PM, Andrei Borzenkov wrote:
>> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: Activating volume group vg0
>> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO:  Reading all physical 
>> volumes. This may take a while... Found volume group "vmspace" using 
>> metadata type lvm2 Found volume group "freespace" using metadata type
>>   lvm2 Found volume group "vg0" using metadata type lvm2
>> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO:  0 logical volume(s) in 
>> volume group "vg0" now active
> Resource agent really does just "vgchange vg0". Does it work when you
> run it manually?
> 

Agree with Andrei.

> 
>> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM Volume vg0 is not 
>> available (stopped)
>> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM: vg0 did not activate 
>> correctly
>> Oct 28 14:42:56 node2 pacemaker-execd[27054]:  notice: 
>> p_lvm_vg0_start_0:8775:stderr [   Configuration node global/use_lvmetad not 
>> found ]

This error indicates the root cause is related to lvmetad. Please check 
lvmetad, eg.

systemctl status lvm2-lvmetad
grep use_lvmetad /etc/lvm/lvm.conf

Check your lvm2 version and google its workaround/fix accordingly.

Cheers,
Roger

>> Oct 28 14:42:56 node2 pacemaker-execd[27054]:  notice: 
>> p_lvm_vg0_start_0:8775:stderr [ ocf-exit-reason:LVM: vg0 did not activate 
>> correctly ]
>> Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: Result of start 
>> operation for p_lvm_vg0 on node2: 7 (not running)
>> Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: 
>> node2-p_lvm_vg0_start_0:77 [   Configuration node global/use_lvmetad not 
>> found\nocf-exit-reason:LVM: vg0 did not activate correctly\n ]
>> Oct 28 14:42:56 node2 pacemaker-controld[27057]:  warning: Action 42 
>> (p_lvm_vg0_start_0) on node2 failed (target: 0 vs. rc: 7): Error
>> Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: Transition 602 
>> aborted by operation p_lvm_vg0_start_0 'modify' on node2: Event failed



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] volume group won't start in a nested DRBD setup

2019-10-28 Thread Andrei Borzenkov
28.10.2019 22:44, Jean-Francois Malouin пишет:
> Hi,
> 
> Is there any new magic that I'm unaware of that needs to be added to a
> pacemaker cluster using a DRBD nested setup? pacemaker 2.0.x and DRBD 8.4.10 
> on
> Debian/Buster on a 2-node cluster with stonith.
> Eventually this will host a bunch of Xen VMs.
> 
> I had this sort of thing running for years with pacemaker 1.x  DRBD 8.4.x
> without an itch and now with pacemaker 2.0 and drbd 8.4.10 it gives me errors
> on trying to start the volume group vg0 on this chain:
> 
>  (VG)   (LV) (PV)   (VG) 
> vmspace > xen_lv0 > drbd0 > vg0 
> 
> Only drbd0 and after are managed by pacemaker.
> 
> Here's what I have configured so far (stonith is configured but is not shown 
> below):
> 
> ---
> primitive p_lvm_vg0 ocf:heartbeat:LVM \
> params volgrpname=vg0 \
> op monitor timeout=30s interval=10s \
> op_params interval=10s
> 
> primitive resDRBDr0 ocf:linbit:drbd \
> params drbd_resource=r0 \
> op start interval=0 timeout=240s \
> op stop interval=0 timeout=100s \
> op monitor interval=29s role=Master timeout=240s \
> op monitor interval=31s role=Slave timeout=240s \
> meta migration-threshold=3 failure-timeout=120s
> 
> ms ms_drbd_r0 resDRBDr0 \
> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 
> notify=true
> 
> colocation c_lvm_vg0_on_drbd_r0 inf: p_lvm_vg0 ms_drbd_r0:Master
> 
> order o_drbd_r0_before_lvm_vg0 Mandatory: ms_drbd_r0:promote p_lvm_vg0:start
> ---
> 
> /etc/lvm/lvm.conf has global_filter set to:
> global_filter = [ "a|/dev/drbd.*|", "a|/dev/md.*|", "a|/dev/md/.*|", "r|.*|" ]
> 
> But I'm note sure if its sufficient. I seem to be missing some crucial 
> ingredient.
> 
> syslog on the DC shows the following when trying to start vg0:
> 
> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: Activating volume group vg0
> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO:  Reading all physical 
> volumes. This may take a while... Found volume group "vmspace" using metadata 
> type lvm2 Found volume group "freespace" using metadata type
>  lvm2 Found volume group "vg0" using metadata type lvm2 
> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO:  0 logical volume(s) in 
> volume group "vg0" now active

Resource agent really does just "vgchange vg0". Does it work when you
run it manually?


> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM Volume vg0 is not 
> available (stopped)
> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM: vg0 did not activate 
> correctly
> Oct 28 14:42:56 node2 pacemaker-execd[27054]:  notice: 
> p_lvm_vg0_start_0:8775:stderr [   Configuration node global/use_lvmetad not 
> found ]
> Oct 28 14:42:56 node2 pacemaker-execd[27054]:  notice: 
> p_lvm_vg0_start_0:8775:stderr [ ocf-exit-reason:LVM: vg0 did not activate 
> correctly ]
> Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: Result of start 
> operation for p_lvm_vg0 on node2: 7 (not running) 
> Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: 
> node2-p_lvm_vg0_start_0:77 [   Configuration node global/use_lvmetad not 
> found\nocf-exit-reason:LVM: vg0 did not activate correctly\n ]
> Oct 28 14:42:56 node2 pacemaker-controld[27057]:  warning: Action 42 
> (p_lvm_vg0_start_0) on node2 failed (target: 0 vs. rc: 7): Error
> Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: Transition 602 
> aborted by operation p_lvm_vg0_start_0 'modify' on node2: Event failed 
> Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: Transition 602 
> (Complete=28, Pending=0, Fired=0, Skipped=0, Incomplete=1, 
> Source=/var/lib/pacemaker/pengine/pe-input-39.bz2): Complete
> Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  notice: On loss of 
> quorum: Ignore
> Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  warning: Processing 
> failed start of p_lvm_vg0 on node2: not running 
> Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  warning: Processing 
> failed start of p_lvm_vg0 on node2: not running 
> Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  warning: Processing 
> failed start of p_lvm_vg0 on node1: not running 
> Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  warning: Forcing 
> p_lvm_vg0 away from node1 after 100 failures (max=100)
> Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  notice:  * Recover
> p_lvm_vg0   ( node2 )  
> Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  notice: Calculated 
> transition 603, saving inputs in /var/lib/pacemaker/pengine/pe-input-40.bz2
> Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  notice: On loss of 
> quorum: Ignore
> Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  warning: Processing 
> failed start of p_lvm_vg0 on node2: not running 
> Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  warning: Processing 
> failed start of p_lvm_vg0 on node2: not running 
> Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  warning: Processing 
> failed start 

[ClusterLabs] volume group won't start in a nested DRBD setup

2019-10-28 Thread Jean-Francois Malouin
Hi,

Is there any new magic that I'm unaware of that needs to be added to a
pacemaker cluster using a DRBD nested setup? pacemaker 2.0.x and DRBD 8.4.10 on
Debian/Buster on a 2-node cluster with stonith.
Eventually this will host a bunch of Xen VMs.

I had this sort of thing running for years with pacemaker 1.x  DRBD 8.4.x
without an itch and now with pacemaker 2.0 and drbd 8.4.10 it gives me errors
on trying to start the volume group vg0 on this chain:

 (VG)   (LV) (PV)   (VG) 
vmspace > xen_lv0 > drbd0 > vg0 

Only drbd0 and after are managed by pacemaker.

Here's what I have configured so far (stonith is configured but is not shown 
below):

---
primitive p_lvm_vg0 ocf:heartbeat:LVM \
params volgrpname=vg0 \
op monitor timeout=30s interval=10s \
op_params interval=10s

primitive resDRBDr0 ocf:linbit:drbd \
params drbd_resource=r0 \
op start interval=0 timeout=240s \
op stop interval=0 timeout=100s \
op monitor interval=29s role=Master timeout=240s \
op monitor interval=31s role=Slave timeout=240s \
meta migration-threshold=3 failure-timeout=120s

ms ms_drbd_r0 resDRBDr0 \
meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

colocation c_lvm_vg0_on_drbd_r0 inf: p_lvm_vg0 ms_drbd_r0:Master

order o_drbd_r0_before_lvm_vg0 Mandatory: ms_drbd_r0:promote p_lvm_vg0:start
---

/etc/lvm/lvm.conf has global_filter set to:
global_filter = [ "a|/dev/drbd.*|", "a|/dev/md.*|", "a|/dev/md/.*|", "r|.*|" ]

But I'm note sure if its sufficient. I seem to be missing some crucial 
ingredient.

syslog on the DC shows the following when trying to start vg0:

Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: Activating volume group vg0
Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO:  Reading all physical 
volumes. This may take a while... Found volume group "vmspace" using metadata 
type lvm2 Found volume group "freespace" using metadata type
 lvm2 Found volume group "vg0" using metadata type lvm2 
Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO:  0 logical volume(s) in 
volume group "vg0" now active 
Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM Volume vg0 is not 
available (stopped)
Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM: vg0 did not activate 
correctly
Oct 28 14:42:56 node2 pacemaker-execd[27054]:  notice: 
p_lvm_vg0_start_0:8775:stderr [   Configuration node global/use_lvmetad not 
found ]
Oct 28 14:42:56 node2 pacemaker-execd[27054]:  notice: 
p_lvm_vg0_start_0:8775:stderr [ ocf-exit-reason:LVM: vg0 did not activate 
correctly ]
Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: Result of start 
operation for p_lvm_vg0 on node2: 7 (not running) 
Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: 
node2-p_lvm_vg0_start_0:77 [   Configuration node global/use_lvmetad not 
found\nocf-exit-reason:LVM: vg0 did not activate correctly\n ]
Oct 28 14:42:56 node2 pacemaker-controld[27057]:  warning: Action 42 
(p_lvm_vg0_start_0) on node2 failed (target: 0 vs. rc: 7): Error
Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: Transition 602 
aborted by operation p_lvm_vg0_start_0 'modify' on node2: Event failed 
Oct 28 14:42:56 node2 pacemaker-controld[27057]:  notice: Transition 602 
(Complete=28, Pending=0, Fired=0, Skipped=0, Incomplete=1, 
Source=/var/lib/pacemaker/pengine/pe-input-39.bz2): Complete
Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  notice: On loss of quorum: 
Ignore
Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  warning: Processing failed 
start of p_lvm_vg0 on node2: not running 
Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  warning: Processing failed 
start of p_lvm_vg0 on node2: not running 
Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  warning: Processing failed 
start of p_lvm_vg0 on node1: not running 
Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  warning: Forcing p_lvm_vg0 
away from node1 after 100 failures (max=100)
Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  notice:  * Recover
p_lvm_vg0   ( node2 )  
Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]:  notice: Calculated 
transition 603, saving inputs in /var/lib/pacemaker/pengine/pe-input-40.bz2
Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  notice: On loss of quorum: 
Ignore
Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  warning: Processing failed 
start of p_lvm_vg0 on node2: not running 
Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  warning: Processing failed 
start of p_lvm_vg0 on node2: not running 
Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  warning: Processing failed 
start of p_lvm_vg0 on node1: not running 
Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  warning: Forcing p_lvm_vg0 
away from node2 after 100 failures (max=100)
Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]:  warning: Forcing p_lvm_vg0 
away from node1 after 100 failures (max=100)
Oct 28 14:42:57 node2