Re: [ClusterLabs] volume group won't start in a nested DRBD setup
* Jean-Francois Malouin [20191029 09:49]: > * Roger Zhou [20191029 06:18]: > > > > On 10/29/19 12:30 PM, Andrei Borzenkov wrote: > > >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: Activating volume > > >> group vg0 > > >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: Reading all physical > > >> volumes. This may take a while... Found volume group "vmspace" using > > >> metadata type lvm2 Found volume group "freespace" using metadata type > > >> lvm2 Found volume group "vg0" using metadata type lvm2 > > >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: 0 logical volume(s) > > >> in volume group "vg0" now active > > > Resource agent really does just "vgchange vg0". Does it work when you > > > run it manually? > > > > > > > Agree with Andrei. > > Yes, it does. > > > > > > > >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM Volume vg0 is not > > >> available (stopped) > > >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM: vg0 did not > > >> activate correctly > > >> Oct 28 14:42:56 node2 pacemaker-execd[27054]: notice: > > >> p_lvm_vg0_start_0:8775:stderr [ Configuration node global/use_lvmetad > > >> not found ] > > > > This error indicates the root cause is related to lvmetad. Please check > > lvmetad, eg. > > > > systemctl status lvm2-lvmetad > > grep use_lvmetad /etc/lvm/lvm.conf > > > > Check your lvm2 version and google its workaround/fix accordingly. > > That was my hunch too. This is on Debian/buster and the lvm.conf is the > original one from the initial install except for the configuration option > devices/filter that added: > > filter = [ "a|/dev/drbd.*|", "a|/dev/md.*|", "a|/dev/md/.*|", "r|.*|" ] > write_cache_state = 0 > > and the configuration option devices/global_filter: > > global_filter = [ "a|/dev/drbd.*|", "a|/dev/md.*|", "a|/dev/md/.*|", > "r|.*|" ] > > There was no 'use_lvmetad' option present initially but I have add it in the > global section. > use_lvmetad = 0 > > And lvm2-lvmetad is not a unit listed under systemd... > > systemctl status lvm2-lvmetad > Unit lvm2-lvmetad.service could not be found. I had a second go at this earlier this morning and it seems that I might have NOT enabled 'filter' and 'global_filter' at the same time, or with 'use_lvmetad = 0' too at the same time as right now the volume group vg0 is running correctly and I can move it around no problem. I'm reviewing all this as I'm a bit baffled... thanks for all your input, jf > > thanks, > jf > > > > > Cheers, > > Roger > > > > >> Oct 28 14:42:56 node2 pacemaker-execd[27054]: notice: > > >> p_lvm_vg0_start_0:8775:stderr [ ocf-exit-reason:LVM: vg0 did not > > >> activate correctly ] > > >> Oct 28 14:42:56 node2 pacemaker-controld[27057]: notice: Result of > > >> start operation for p_lvm_vg0 on node2: 7 (not running) > > >> Oct 28 14:42:56 node2 pacemaker-controld[27057]: notice: > > >> node2-p_lvm_vg0_start_0:77 [ Configuration node global/use_lvmetad not > > >> found\nocf-exit-reason:LVM: vg0 did not activate correctly\n ] > > >> Oct 28 14:42:56 node2 pacemaker-controld[27057]: warning: Action 42 > > >> (p_lvm_vg0_start_0) on node2 failed (target: 0 vs. rc: 7): Error > > >> Oct 28 14:42:56 node2 pacemaker-controld[27057]: notice: Transition 602 > > >> aborted by operation p_lvm_vg0_start_0 'modify' on node2: Event failed > > ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] volume group won't start in a nested DRBD setup
* Roger Zhou [20191029 06:18]: > > On 10/29/19 12:30 PM, Andrei Borzenkov wrote: > >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: Activating volume group > >> vg0 > >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: Reading all physical > >> volumes. This may take a while... Found volume group "vmspace" using > >> metadata type lvm2 Found volume group "freespace" using metadata type > >> lvm2 Found volume group "vg0" using metadata type lvm2 > >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: 0 logical volume(s) in > >> volume group "vg0" now active > > Resource agent really does just "vgchange vg0". Does it work when you > > run it manually? > > > > Agree with Andrei. Yes, it does. > > > > >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM Volume vg0 is not > >> available (stopped) > >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM: vg0 did not > >> activate correctly > >> Oct 28 14:42:56 node2 pacemaker-execd[27054]: notice: > >> p_lvm_vg0_start_0:8775:stderr [ Configuration node global/use_lvmetad > >> not found ] > > This error indicates the root cause is related to lvmetad. Please check > lvmetad, eg. > > systemctl status lvm2-lvmetad > grep use_lvmetad /etc/lvm/lvm.conf > > Check your lvm2 version and google its workaround/fix accordingly. That was my hunch too. This is on Debian/buster and the lvm.conf is the original one from the initial install except for the configuration option devices/filter that added: filter = [ "a|/dev/drbd.*|", "a|/dev/md.*|", "a|/dev/md/.*|", "r|.*|" ] write_cache_state = 0 and the configuration option devices/global_filter: global_filter = [ "a|/dev/drbd.*|", "a|/dev/md.*|", "a|/dev/md/.*|", "r|.*|" ] There was no 'use_lvmetad' option present initially but I have add it in the global section. use_lvmetad = 0 And lvm2-lvmetad is not a unit listed under systemd... systemctl status lvm2-lvmetad Unit lvm2-lvmetad.service could not be found. thanks, jf > > Cheers, > Roger > > >> Oct 28 14:42:56 node2 pacemaker-execd[27054]: notice: > >> p_lvm_vg0_start_0:8775:stderr [ ocf-exit-reason:LVM: vg0 did not activate > >> correctly ] > >> Oct 28 14:42:56 node2 pacemaker-controld[27057]: notice: Result of start > >> operation for p_lvm_vg0 on node2: 7 (not running) > >> Oct 28 14:42:56 node2 pacemaker-controld[27057]: notice: > >> node2-p_lvm_vg0_start_0:77 [ Configuration node global/use_lvmetad not > >> found\nocf-exit-reason:LVM: vg0 did not activate correctly\n ] > >> Oct 28 14:42:56 node2 pacemaker-controld[27057]: warning: Action 42 > >> (p_lvm_vg0_start_0) on node2 failed (target: 0 vs. rc: 7): Error > >> Oct 28 14:42:56 node2 pacemaker-controld[27057]: notice: Transition 602 > >> aborted by operation p_lvm_vg0_start_0 'modify' on node2: Event failed > > > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] volume group won't start in a nested DRBD setup
On 10/29/19 12:30 PM, Andrei Borzenkov wrote: >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: Activating volume group vg0 >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: Reading all physical >> volumes. This may take a while... Found volume group "vmspace" using >> metadata type lvm2 Found volume group "freespace" using metadata type >> lvm2 Found volume group "vg0" using metadata type lvm2 >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: 0 logical volume(s) in >> volume group "vg0" now active > Resource agent really does just "vgchange vg0". Does it work when you > run it manually? > Agree with Andrei. > >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM Volume vg0 is not >> available (stopped) >> Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM: vg0 did not activate >> correctly >> Oct 28 14:42:56 node2 pacemaker-execd[27054]: notice: >> p_lvm_vg0_start_0:8775:stderr [ Configuration node global/use_lvmetad not >> found ] This error indicates the root cause is related to lvmetad. Please check lvmetad, eg. systemctl status lvm2-lvmetad grep use_lvmetad /etc/lvm/lvm.conf Check your lvm2 version and google its workaround/fix accordingly. Cheers, Roger >> Oct 28 14:42:56 node2 pacemaker-execd[27054]: notice: >> p_lvm_vg0_start_0:8775:stderr [ ocf-exit-reason:LVM: vg0 did not activate >> correctly ] >> Oct 28 14:42:56 node2 pacemaker-controld[27057]: notice: Result of start >> operation for p_lvm_vg0 on node2: 7 (not running) >> Oct 28 14:42:56 node2 pacemaker-controld[27057]: notice: >> node2-p_lvm_vg0_start_0:77 [ Configuration node global/use_lvmetad not >> found\nocf-exit-reason:LVM: vg0 did not activate correctly\n ] >> Oct 28 14:42:56 node2 pacemaker-controld[27057]: warning: Action 42 >> (p_lvm_vg0_start_0) on node2 failed (target: 0 vs. rc: 7): Error >> Oct 28 14:42:56 node2 pacemaker-controld[27057]: notice: Transition 602 >> aborted by operation p_lvm_vg0_start_0 'modify' on node2: Event failed ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] volume group won't start in a nested DRBD setup
28.10.2019 22:44, Jean-Francois Malouin пишет: > Hi, > > Is there any new magic that I'm unaware of that needs to be added to a > pacemaker cluster using a DRBD nested setup? pacemaker 2.0.x and DRBD 8.4.10 > on > Debian/Buster on a 2-node cluster with stonith. > Eventually this will host a bunch of Xen VMs. > > I had this sort of thing running for years with pacemaker 1.x DRBD 8.4.x > without an itch and now with pacemaker 2.0 and drbd 8.4.10 it gives me errors > on trying to start the volume group vg0 on this chain: > > (VG) (LV) (PV) (VG) > vmspace > xen_lv0 > drbd0 > vg0 > > Only drbd0 and after are managed by pacemaker. > > Here's what I have configured so far (stonith is configured but is not shown > below): > > --- > primitive p_lvm_vg0 ocf:heartbeat:LVM \ > params volgrpname=vg0 \ > op monitor timeout=30s interval=10s \ > op_params interval=10s > > primitive resDRBDr0 ocf:linbit:drbd \ > params drbd_resource=r0 \ > op start interval=0 timeout=240s \ > op stop interval=0 timeout=100s \ > op monitor interval=29s role=Master timeout=240s \ > op monitor interval=31s role=Slave timeout=240s \ > meta migration-threshold=3 failure-timeout=120s > > ms ms_drbd_r0 resDRBDr0 \ > meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 > notify=true > > colocation c_lvm_vg0_on_drbd_r0 inf: p_lvm_vg0 ms_drbd_r0:Master > > order o_drbd_r0_before_lvm_vg0 Mandatory: ms_drbd_r0:promote p_lvm_vg0:start > --- > > /etc/lvm/lvm.conf has global_filter set to: > global_filter = [ "a|/dev/drbd.*|", "a|/dev/md.*|", "a|/dev/md/.*|", "r|.*|" ] > > But I'm note sure if its sufficient. I seem to be missing some crucial > ingredient. > > syslog on the DC shows the following when trying to start vg0: > > Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: Activating volume group vg0 > Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: Reading all physical > volumes. This may take a while... Found volume group "vmspace" using metadata > type lvm2 Found volume group "freespace" using metadata type > lvm2 Found volume group "vg0" using metadata type lvm2 > Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: 0 logical volume(s) in > volume group "vg0" now active Resource agent really does just "vgchange vg0". Does it work when you run it manually? > Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM Volume vg0 is not > available (stopped) > Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM: vg0 did not activate > correctly > Oct 28 14:42:56 node2 pacemaker-execd[27054]: notice: > p_lvm_vg0_start_0:8775:stderr [ Configuration node global/use_lvmetad not > found ] > Oct 28 14:42:56 node2 pacemaker-execd[27054]: notice: > p_lvm_vg0_start_0:8775:stderr [ ocf-exit-reason:LVM: vg0 did not activate > correctly ] > Oct 28 14:42:56 node2 pacemaker-controld[27057]: notice: Result of start > operation for p_lvm_vg0 on node2: 7 (not running) > Oct 28 14:42:56 node2 pacemaker-controld[27057]: notice: > node2-p_lvm_vg0_start_0:77 [ Configuration node global/use_lvmetad not > found\nocf-exit-reason:LVM: vg0 did not activate correctly\n ] > Oct 28 14:42:56 node2 pacemaker-controld[27057]: warning: Action 42 > (p_lvm_vg0_start_0) on node2 failed (target: 0 vs. rc: 7): Error > Oct 28 14:42:56 node2 pacemaker-controld[27057]: notice: Transition 602 > aborted by operation p_lvm_vg0_start_0 'modify' on node2: Event failed > Oct 28 14:42:56 node2 pacemaker-controld[27057]: notice: Transition 602 > (Complete=28, Pending=0, Fired=0, Skipped=0, Incomplete=1, > Source=/var/lib/pacemaker/pengine/pe-input-39.bz2): Complete > Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]: notice: On loss of > quorum: Ignore > Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]: warning: Processing > failed start of p_lvm_vg0 on node2: not running > Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]: warning: Processing > failed start of p_lvm_vg0 on node2: not running > Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]: warning: Processing > failed start of p_lvm_vg0 on node1: not running > Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]: warning: Forcing > p_lvm_vg0 away from node1 after 100 failures (max=100) > Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]: notice: * Recover > p_lvm_vg0 ( node2 ) > Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]: notice: Calculated > transition 603, saving inputs in /var/lib/pacemaker/pengine/pe-input-40.bz2 > Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]: notice: On loss of > quorum: Ignore > Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]: warning: Processing > failed start of p_lvm_vg0 on node2: not running > Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]: warning: Processing > failed start of p_lvm_vg0 on node2: not running > Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]: warning: Processing > failed start
[ClusterLabs] volume group won't start in a nested DRBD setup
Hi, Is there any new magic that I'm unaware of that needs to be added to a pacemaker cluster using a DRBD nested setup? pacemaker 2.0.x and DRBD 8.4.10 on Debian/Buster on a 2-node cluster with stonith. Eventually this will host a bunch of Xen VMs. I had this sort of thing running for years with pacemaker 1.x DRBD 8.4.x without an itch and now with pacemaker 2.0 and drbd 8.4.10 it gives me errors on trying to start the volume group vg0 on this chain: (VG) (LV) (PV) (VG) vmspace > xen_lv0 > drbd0 > vg0 Only drbd0 and after are managed by pacemaker. Here's what I have configured so far (stonith is configured but is not shown below): --- primitive p_lvm_vg0 ocf:heartbeat:LVM \ params volgrpname=vg0 \ op monitor timeout=30s interval=10s \ op_params interval=10s primitive resDRBDr0 ocf:linbit:drbd \ params drbd_resource=r0 \ op start interval=0 timeout=240s \ op stop interval=0 timeout=100s \ op monitor interval=29s role=Master timeout=240s \ op monitor interval=31s role=Slave timeout=240s \ meta migration-threshold=3 failure-timeout=120s ms ms_drbd_r0 resDRBDr0 \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true colocation c_lvm_vg0_on_drbd_r0 inf: p_lvm_vg0 ms_drbd_r0:Master order o_drbd_r0_before_lvm_vg0 Mandatory: ms_drbd_r0:promote p_lvm_vg0:start --- /etc/lvm/lvm.conf has global_filter set to: global_filter = [ "a|/dev/drbd.*|", "a|/dev/md.*|", "a|/dev/md/.*|", "r|.*|" ] But I'm note sure if its sufficient. I seem to be missing some crucial ingredient. syslog on the DC shows the following when trying to start vg0: Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: Activating volume group vg0 Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: Reading all physical volumes. This may take a while... Found volume group "vmspace" using metadata type lvm2 Found volume group "freespace" using metadata type lvm2 Found volume group "vg0" using metadata type lvm2 Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: INFO: 0 logical volume(s) in volume group "vg0" now active Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM Volume vg0 is not available (stopped) Oct 28 14:42:56 node2 LVM(p_lvm_vg0)[8775]: ERROR: LVM: vg0 did not activate correctly Oct 28 14:42:56 node2 pacemaker-execd[27054]: notice: p_lvm_vg0_start_0:8775:stderr [ Configuration node global/use_lvmetad not found ] Oct 28 14:42:56 node2 pacemaker-execd[27054]: notice: p_lvm_vg0_start_0:8775:stderr [ ocf-exit-reason:LVM: vg0 did not activate correctly ] Oct 28 14:42:56 node2 pacemaker-controld[27057]: notice: Result of start operation for p_lvm_vg0 on node2: 7 (not running) Oct 28 14:42:56 node2 pacemaker-controld[27057]: notice: node2-p_lvm_vg0_start_0:77 [ Configuration node global/use_lvmetad not found\nocf-exit-reason:LVM: vg0 did not activate correctly\n ] Oct 28 14:42:56 node2 pacemaker-controld[27057]: warning: Action 42 (p_lvm_vg0_start_0) on node2 failed (target: 0 vs. rc: 7): Error Oct 28 14:42:56 node2 pacemaker-controld[27057]: notice: Transition 602 aborted by operation p_lvm_vg0_start_0 'modify' on node2: Event failed Oct 28 14:42:56 node2 pacemaker-controld[27057]: notice: Transition 602 (Complete=28, Pending=0, Fired=0, Skipped=0, Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-input-39.bz2): Complete Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]: notice: On loss of quorum: Ignore Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]: warning: Processing failed start of p_lvm_vg0 on node2: not running Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]: warning: Processing failed start of p_lvm_vg0 on node2: not running Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]: warning: Processing failed start of p_lvm_vg0 on node1: not running Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]: warning: Forcing p_lvm_vg0 away from node1 after 100 failures (max=100) Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]: notice: * Recover p_lvm_vg0 ( node2 ) Oct 28 14:42:56 node2 pacemaker-schedulerd[27056]: notice: Calculated transition 603, saving inputs in /var/lib/pacemaker/pengine/pe-input-40.bz2 Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]: notice: On loss of quorum: Ignore Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]: warning: Processing failed start of p_lvm_vg0 on node2: not running Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]: warning: Processing failed start of p_lvm_vg0 on node2: not running Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]: warning: Processing failed start of p_lvm_vg0 on node1: not running Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]: warning: Forcing p_lvm_vg0 away from node2 after 100 failures (max=100) Oct 28 14:42:57 node2 pacemaker-schedulerd[27056]: warning: Forcing p_lvm_vg0 away from node1 after 100 failures (max=100) Oct 28 14:42:57 node2