Re: [Pacemaker] crm_simulate a resource failure
Ubuntu 12.04 pacemaker1.1.6-2ubuntu3 If I run crm_simulate -L I get this: root@Vulture:~# crm_simulate -L *** glibc detected *** crm_simulate: double free or corruption (out): 0x01fc2e00 *** === Backtrace: = /lib/x86_64-linux-gnu/libc.so.6(+0x7e626)[0x7f35df08e626] /usr/lib/libcib.so.1(+0xc259)[0x7f35df96b259] /lib/x86_64-linux-gnu/libglib-2.0.so.0(+0x373d3)[0x7f35ded523d3] /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_hash_table_remove_all+0x1d)[0x7f35ded5324d] /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_hash_table_destroy+0xe)[0x7f35ded532de] /usr/lib/libcib.so.1(cib_new_variant+0x155)[0x7f35df96b9fb] /usr/lib/libcib.so.1(cib_file_new+0x28)[0x7f35df971aad] /usr/lib/libcib.so.1(cib_new+0x62)[0x7f35df96b70c] crm_simulate[0x40614f] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f35df03176d] crm_simulate[0x402279] === Memory map: 0040-00409000 r-xp fb:05 541771 /usr/sbin/crm_simulate 00608000-00609000 r--p 8000 fb:05 541771 /usr/sbin/crm_simulate 00609000-0060a000 rw-p 9000 fb:05 541771 /usr/sbin/crm_simulate 01fbb000-02259000 rw-p 00:00 0 [heap] 7f35dbba6000-7f35d000 r-xp fb:05 393311 /lib/x86_64-linux-gnu/libgcc_s.so.1 7f35d000-7f35dbdba000 ---p 00015000 fb:05 393311 /lib/x86_64-linux-gnu/libgcc_s.so.1 7f35dbdba000-7f35dbdbb000 r--p 00014000 fb:05 393311 /lib/x86_64-linux-gnu/libgcc_s.so.1 7f35dbdbb000-7f35dbdbc000 rw-p 00015000 fb:05 393311 /lib/x86_64-linux-gnu/libgcc_s.so.1 7f35dbdbc000-7f35dbdbf000 r-xp fb:05 393398 /lib/x86_64-linux-gnu/libgpg-error.so.0.8.0 7f35dbdbf000-7f35dbfbe000 ---p 3000 fb:05 393398 /lib/x86_64-linux-gnu/libgpg-error.so.0.8.0 7f35dbfbe000-7f35dbfbf000 r--p 2000 fb:05 393398 /lib/x86_64-linux-gnu/libgpg-error.so.0.8.0 7f35dbfbf000-7f35dbfc rw-p 3000 fb:05 393398 /lib/x86_64-linux-gnu/libgpg-error.so.0.8.0 7f35dbfc-7f35dbfc8000 r-xp fb:05 525256 /usr/lib/x86_64-linux-gnu/libltdl.so.7.3.0 7f35dbfc8000-7f35dc1c8000 ---p 8000 fb:05 525256 /usr/lib/x86_64-linux-gnu/libltdl.so.7.3.0 7f35dc1c8000-7f35dc1c9000 r--p 8000 fb:05 525256 /usr/lib/x86_64-linux-gnu/libltdl.so.7.3.0 7f35dc1c9000-7f35dc1ca000 rw-p 9000 fb:05 525256 /usr/lib/x86_64-linux-gnu/libltdl.so.7.3.0 7f35dc1ca000-7f35dc1db000 r-xp fb:05 525504 /usr/lib/x86_64-linux-gnu/libp11-kit.so.0.0.0 7f35dc1db000-7f35dc3da000 ---p 00011000 fb:05 525504 /usr/lib/x86_64-linux-gnu/libp11-kit.so.0.0.0 7f35dc3da000-7f35dc3db000 r--p 0001 fb:05 525504 /usr/lib/x86_64-linux-gnu/libp11-kit.so.0.0.0 7f35dc3db000-7f35dc3dc000 rw-p 00011000 fb:05 525504 /usr/lib/x86_64-linux-gnu/libp11-kit.so.0.0.0 7f35dc3dc000-7f35dc456000 r-xp fb:05 393527 /lib/x86_64-linux-gnu/libgcrypt.so.11.7.0 7f35dc456000-7f35dc656000 ---p 0007a000 fb:05 393527 /lib/x86_64-linux-gnu/libgcrypt.so.11.7.0 7f35dc656000-7f35dc657000 r--p 0007a000 fb:05 393527 /lib/x86_64-linux-gnu/libgcrypt.so.11.7.0 7f35dc657000-7f35dc65a000 rw-p 0007b000 fb:05 393527 /lib/x86_64-linux-gnu/libgcrypt.so.11.7.0 7f35dc65a000-7f35dc66a000 r-xp fb:05 525508 /usr/lib/x86_64-linux-gnu/libtasn1.so.3.1.12 7f35dc66a000-7f35dc869000 ---p 0001 fb:05 525508 /usr/lib/x86_64-linux-gnu/libtasn1.so.3.1.12 7f35dc869000-7f35dc86a000 r--p f000 fb:05 525508 /usr/lib/x86_64-linux-gnu/libtasn1.so.3.1.12 7f35dc86a000-7f35dc86b000 rw-p 0001 fb:05 525508 /usr/lib/x86_64-linux-gnu/libtasn1.so.3.1.12 7f35dc86b000-7f35dc872000 r-xp fb:05 403340 /lib/x86_64-linux-gnu/librt-2.15.so 7f35dc872000-7f35dca71000 ---p 7000 fb:05 403340 /lib/x86_64-linux-gnu/librt-2.15.so 7f35dca71000-7f35dca72000 r--p 6000 fb:05 403340 /lib/x86_64-linux-gnu/librt-2.15.so 7f35dca72000-7f35dca73000 rw-p 7000 fb:05 403340 /lib/x86_64-linux-gnu/librt-2.15.so 7f35dca73000-7f35dca8b000 r-xp fb:05 403404 /lib/x86_64-linux-gnu/libpthread-2.15.so 7f35dca8b000-7f35dcc8a000 ---p 00018000 fb:05 403404 /lib/x86_64-linux-gnu/libpthread-2.15.so 7f35dcc8a000-7f35dcc8b000 r--p 00017000 fb:05 403404 /lib/x86_64-linux-gnu/libpthread-2.15.so 7f35dcc8b000-7f35dcc8c000 rw-p 00018000 fb:05 403404 /lib/x86_64-linux-gnu/libpthread-2.15.so 7f35dcc8c000-7f35dc
Re: [Pacemaker] crm_simulate a resource failure
On Wed, Oct 24, 2012 at 5:40 AM, Jake Smith wrote: > Maybe try with verbose flag. Maybe try with error for the exit code? Or > try $stop and $error to see if it will show anything - I would expect > something like a node fence from that. > > crm_simulate with -LS for me causes seg fault so I can't test :-( Really? What version? > > Jake > > > > From: "Cal Heldenbrand" > To: "Jake Smith" , "The Pacemaker cluster resource > manager" > Sent: Tuesday, October 23, 2012 2:01:59 PM > Subject: Re: [Pacemaker] crm_simulate a resource failure > > > Thanks Jake, that at gives a little better description of the parameters, > but I still just can't seem to get anything to trigger with the various > syntaxes I'm trying. See below, I'm using single quotes so the $ symbol > isn't parsed by bash. I've tried using my clone name, different return > values, different task names, without the $ symbols... nothing seems to > trigger anything in the Transition stuff. And I don't get any error > messages at all. > > Any other ideas for me? > > Thanks! > > - > [root@m3 /]# crm_simulate -LS > --op-fail='$memcached:0_$monitor_$1@$m1.fbsdata.com=$not_running' > > Current cluster status: > Online: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ] > > Clone Set: memcached_clone [memcached] > Started: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ] > cluster-ip-m1 (ocf::heartbeat:IPaddr2): Started m1.fbsdata.com > cluster-ip-m2 (ocf::heartbeat:IPaddr2): Started m2.fbsdata.com > cluster-ip-m3 (ocf::heartbeat:IPaddr2): Started m3.fbsdata.com > > Transition Summary: > > Executing cluster transition: > > Revised cluster status: > Online: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ] > > Clone Set: memcached_clone [memcached] > Started: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ] > cluster-ip-m1 (ocf::heartbeat:IPaddr2): Started m1.fbsdata.com > cluster-ip-m2 (ocf::heartbeat:IPaddr2): Started m2.fbsdata.com > cluster-ip-m3 (ocf::heartbeat:IPaddr2): Started m3.fbsdata.com > - > > On Tue, Oct 23, 2012 at 12:27 PM, Jake Smith wrote: >> >> >> - Original Message - >> >> > From: "Cal Heldenbrand" >> > To: pacemaker@oss.clusterlabs.org >> > Sent: Tuesday, October 23, 2012 11:50:11 AM >> > Subject: [Pacemaker] crm_simulate a resource failure >> >> > Hi everyone, >> >> > I'm not able to find documentation or examples on this. If I have a >> > cloned primitive set across a cluster, how can I simulate a failure >> > of a resource on an individual node? I mainly want to see the scores >> > on why a particular action is taken so I can adjust my configs. >> >> > I think the --op-fail parameter is what I need, but I just don't get >> > the syntax of the value in the man page. >> >> I usually use the crm shell so I'm not positive but I think these are the >> parts you need... >> >> $rsc_$task_$interval@$node=$rc >> >> $rsc = resource to test, in your case I believe you want to specify the >> primitive instance of the clone i.e. p_resource:0 >> $task = monitor or migrate or stop or whatever operation you want to take >> $interval = the interval of a monitor task >> $node = the node >> $rc = the exit code you want to fail with i.e. error, not_running >> >> So (I think) something like: >> --op-fail=$p_of_clone_resource:0_$monitor_$10@$node1=$not_running >> >> You *should* be able to experiment till you get it just right since its >> simulate.. :-) >> >> HTH >> >> Jake >> >> > Thank you! >> >> > --Cal >> >> > ___ >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> > Project Home: http://www.clusterlabs.org >> > Getting started: >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > Bugs: http://bugs.clusterlabs.org >> >> ___ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crm_simulate a resource failure
On Wed, Oct 24, 2012 at 5:01 AM, Cal Heldenbrand wrote: > Thanks Jake, that at gives a little better description of the parameters, > but I still just can't seem to get anything to trigger with the various > syntaxes I'm trying. See below, I'm using single quotes so the $ symbol > isn't parsed by bash. I've tried using my clone name, different return > values, different task names, without the $ symbols... nothing seems to > trigger anything in the Transition stuff. And I don't get any error > messages at all. > > Any other ideas for me? Definitely don't include the $ symbols. $rsc for example was intended to mean "put the name of your resource here". Maybe I need to include an example too. --op-fail isn't the command you want though. >From the man page: -i, --op-inject=value $rsc_$task_$interval@$node=$rc - Inject the specified task before running the simulation -F, --op-fail=value $rsc_$task_$interval@$node=$rc - Fail the specified task while running the simulation Note the difference between the two descriptions: before vs. while. --op-inject is the one you want. It is mostly useful for pretending a recurring monitor failed and seeing what the cluster would do about it. --op-fail on the other hand, is used for pretending that part of the recovery process failed. So if you ran: crm_simulate -LS --op-inject memcached:0_monito...@m1.fbsdata.com=7 --op-fail memcached:0_sto...@m1.fbsdata.com=1 --save-output /tmp/memcached-test.xml You see what Pacemaker would do if a monitoring failure of memcached occurred. The simulation would stop at the point the memcached stop action was run (because we also specified it should fail too), so anything that needed memcached to stop first would not yet be stopped. You can then see how Pacemaker would react to the second failure by running: crm_simulate --xml-file /tmp/memcached-test.xml -S Perhaps the man page should include an example like this? > > Thanks! > > - > [root@m3 /]# crm_simulate -LS > --op-fail='$memcached:0_$monitor_$1@$m1.fbsdata.com=$not_running' > > Current cluster status: > Online: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ] > > Clone Set: memcached_clone [memcached] > Started: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ] > cluster-ip-m1 (ocf::heartbeat:IPaddr2): Started m1.fbsdata.com > cluster-ip-m2 (ocf::heartbeat:IPaddr2): Started m2.fbsdata.com > cluster-ip-m3 (ocf::heartbeat:IPaddr2): Started m3.fbsdata.com > > Transition Summary: > > Executing cluster transition: > > Revised cluster status: > Online: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ] > > Clone Set: memcached_clone [memcached] > Started: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ] > cluster-ip-m1 (ocf::heartbeat:IPaddr2): Started m1.fbsdata.com > cluster-ip-m2 (ocf::heartbeat:IPaddr2): Started m2.fbsdata.com > cluster-ip-m3 (ocf::heartbeat:IPaddr2): Started m3.fbsdata.com > - > > On Tue, Oct 23, 2012 at 12:27 PM, Jake Smith wrote: >> >> >> - Original Message - >> >> > From: "Cal Heldenbrand" >> > To: pacemaker@oss.clusterlabs.org >> > Sent: Tuesday, October 23, 2012 11:50:11 AM >> > Subject: [Pacemaker] crm_simulate a resource failure >> >> > Hi everyone, >> >> > I'm not able to find documentation or examples on this. If I have a >> > cloned primitive set across a cluster, how can I simulate a failure >> > of a resource on an individual node? I mainly want to see the scores >> > on why a particular action is taken so I can adjust my configs. >> >> > I think the --op-fail parameter is what I need, but I just don't get >> > the syntax of the value in the man page. >> >> I usually use the crm shell so I'm not positive but I think these are the >> parts you need... >> >> $rsc_$task_$interval@$node=$rc >> >> $rsc = resource to test, in your case I believe you want to specify the >> primitive instance of the clone i.e. p_resource:0 >> $task = monitor or migrate or stop or whatever operation you want to take >> $interval = the interval of a monitor task >> $node = the node >> $rc = the exit code you want to fail with i.e. error, not_running >> >> So (I think) something like: >> --op-fail=$p_of_clone_resource:0_$monitor_$10@$node1=$not_running >> >> You *should* be able to experiment till you get it just right since its >> simulate.. :-) >> >> HTH >> >> Jake >> >> > Thank you! >> >> > --Cal >> >> > ___ >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> > Project Home: http://www.clusterlabs.org >> > Getting started: >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > Bugs: http://bugs.clusterlabs.org >> >> ___ >> Pacem
Re: [Pacemaker] crm_simulate a resource failure
Oye, nothing I try seems to work. Verbose doesn't give me any extra output. $stop with $error also didn't do anything. I've tried a whole mess of combinations and none of them seemed to make a difference. On Tue, Oct 23, 2012 at 1:40 PM, Jake Smith wrote: > Maybe try with verbose flag. Maybe try with error for the exit code? Or > try $stop and $error to see if it will show anything - I would expect > something like a node fence from that. > > crm_simulate with -LS for me causes seg fault so I can't test :-( > > Jake > > -- > *From: *"Cal Heldenbrand" > *To: *"Jake Smith" , "The Pacemaker cluster resource > manager" > *Sent: *Tuesday, October 23, 2012 2:01:59 PM > *Subject: *Re: [Pacemaker] crm_simulate a resource failure > > > Thanks Jake, that at gives a little better description of the parameters, > but I still just can't seem to get anything to trigger with the various > syntaxes I'm trying. See below, I'm using single quotes so the $ symbol > isn't parsed by bash. I've tried using my clone name, different return > values, different task names, without the $ symbols... nothing seems to > trigger anything in the Transition stuff. And I don't get any error > messages at all. > > Any other ideas for me? > > Thanks! > > > - > [root@m3 /]# crm_simulate -LS --op-fail='$memcached:0_$monitor_$1@$ > m1.fbsdata.com=$not_running' > > Current cluster status: > Online: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ] > > Clone Set: memcached_clone [memcached] > Started: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ] > cluster-ip-m1 (ocf::heartbeat:IPaddr2): Started m1.fbsdata.com > cluster-ip-m2 (ocf::heartbeat:IPaddr2): Started m2.fbsdata.com > cluster-ip-m3 (ocf::heartbeat:IPaddr2): Started m3.fbsdata.com > > Transition Summary: > > Executing cluster transition: > > Revised cluster status: > Online: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ] > > Clone Set: memcached_clone [memcached] > Started: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ] > cluster-ip-m1 (ocf::heartbeat:IPaddr2): Started m1.fbsdata.com > cluster-ip-m2 (ocf::heartbeat:IPaddr2): Started m2.fbsdata.com > cluster-ip-m3 (ocf::heartbeat:IPaddr2): Started m3.fbsdata.com > > - > > On Tue, Oct 23, 2012 at 12:27 PM, Jake Smith wrote: > >> >> - Original Message - >> >> > From: "Cal Heldenbrand" >> > To: pacemaker@oss.clusterlabs.org >> > Sent: Tuesday, October 23, 2012 11:50:11 AM >> > Subject: [Pacemaker] crm_simulate a resource failure >> >> > Hi everyone, >> >> > I'm not able to find documentation or examples on this. If I have a >> > cloned primitive set across a cluster, how can I simulate a failure >> > of a resource on an individual node? I mainly want to see the scores >> > on why a particular action is taken so I can adjust my configs. >> >> > I think the --op-fail parameter is what I need, but I just don't get >> > the syntax of the value in the man page. >> >> I usually use the crm shell so I'm not positive but I think these are the >> parts you need... >> >> $rsc_$task_$interval@$node=$rc >> >> $rsc = resource to test, in your case I believe you want to specify the >> primitive instance of the clone i.e. p_resource:0 >> $task = monitor or migrate or stop or whatever operation you want to take >> $interval = the interval of a monitor task >> $node = the node >> $rc = the exit code you want to fail with i.e. error, not_running >> >> So (I think) something like: >> --op-fail=$p_of_clone_resource:0_$monitor_$10@$node1=$not_running >> >> You *should* be able to experiment till you get it just right since its >> simulate.. :-) >> >> HTH >> >> Jake >> >> > Thank you! >> >> > --Cal >> >> > ___ >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> > Project Home: http://www.clusterlabs.org >> > Getting started: >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > Bugs: http://bugs.clusterlabs.org >> >> ___ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crm_simulate a resource failure
Maybe try with verbose flag. Maybe try with error for the exit code? Or try $stop and $error to see if it will show anything - I would expect something like a node fence from that. crm_simulate with -LS for me causes seg fault so I can't test :-( Jake - Original Message - From: "Cal Heldenbrand" To: "Jake Smith" , "The Pacemaker cluster resource manager" Sent: Tuesday, October 23, 2012 2:01:59 PM Subject: Re: [Pacemaker] crm_simulate a resource failure Thanks Jake, that at gives a little better description of the parameters, but I still just can't seem to get anything to trigger with the various syntaxes I'm trying. See below, I'm using single quotes so the $ symbol isn't parsed by bash. I've tried using my clone name, different return values, different task names, without the $ symbols... nothing seems to trigger anything in the Transition stuff. And I don't get any error messages at all. Any other ideas for me? Thanks! - [root@m3 /]# crm_simulate -LS --op-fail='$memcached:0_$monitor_$1@$ m1.fbsdata.com =$not_running' Current cluster status: Online: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ] Clone Set: memcached_clone [memcached] Started: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ] cluster-ip-m1 (ocf::heartbeat:IPaddr2): Started m1.fbsdata.com cluster-ip-m2 (ocf::heartbeat:IPaddr2): Started m2.fbsdata.com cluster-ip-m3 (ocf::heartbeat:IPaddr2): Started m3.fbsdata.com Transition Summary: Executing cluster transition: Revised cluster status: Online: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ] Clone Set: memcached_clone [memcached] Started: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ] cluster-ip-m1 (ocf::heartbeat:IPaddr2): Started m1.fbsdata.com cluster-ip-m2 (ocf::heartbeat:IPaddr2): Started m2.fbsdata.com cluster-ip-m3 (ocf::heartbeat:IPaddr2): Started m3.fbsdata.com - On Tue, Oct 23, 2012 at 12:27 PM, Jake Smith < jsm...@argotec.com > wrote: - Original Message - > From: "Cal Heldenbrand" < c...@fbsdata.com > > To: pacemaker@oss.clusterlabs.org > Sent: Tuesday, October 23, 2012 11:50:11 AM > Subject: [Pacemaker] crm_simulate a resource failure > Hi everyone, > I'm not able to find documentation or examples on this. If I have a > cloned primitive set across a cluster, how can I simulate a failure > of a resource on an individual node? I mainly want to see the scores > on why a particular action is taken so I can adjust my configs. > I think the --op-fail parameter is what I need, but I just don't get > the syntax of the value in the man page. I usually use the crm shell so I'm not positive but I think these are the parts you need... $rsc_$task_$interval@$node=$rc $rsc = resource to test, in your case I believe you want to specify the primitive instance of the clone i.e. p_resource:0 $task = monitor or migrate or stop or whatever operation you want to take $interval = the interval of a monitor task $node = the node $rc = the exit code you want to fail with i.e. error, not_running So (I think) something like: --op-fail=$p_of_clone_resource:0_$monitor_$10@$node1=$not_running You *should* be able to experiment till you get it just right since its simulate.. :-) HTH Jake > Thank you! > --Cal > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > Project Home: http://www.clusterlabs.org > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crm_simulate a resource failure
Thanks Jake, that at gives a little better description of the parameters, but I still just can't seem to get anything to trigger with the various syntaxes I'm trying. See below, I'm using single quotes so the $ symbol isn't parsed by bash. I've tried using my clone name, different return values, different task names, without the $ symbols... nothing seems to trigger anything in the Transition stuff. And I don't get any error messages at all. Any other ideas for me? Thanks! - [root@m3 /]# crm_simulate -LS --op-fail='$memcached:0_$monitor_$1@$ m1.fbsdata.com=$not_running' Current cluster status: Online: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ] Clone Set: memcached_clone [memcached] Started: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ] cluster-ip-m1 (ocf::heartbeat:IPaddr2): Started m1.fbsdata.com cluster-ip-m2 (ocf::heartbeat:IPaddr2): Started m2.fbsdata.com cluster-ip-m3 (ocf::heartbeat:IPaddr2): Started m3.fbsdata.com Transition Summary: Executing cluster transition: Revised cluster status: Online: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ] Clone Set: memcached_clone [memcached] Started: [ m1.fbsdata.com m2.fbsdata.com m3.fbsdata.com ] cluster-ip-m1 (ocf::heartbeat:IPaddr2): Started m1.fbsdata.com cluster-ip-m2 (ocf::heartbeat:IPaddr2): Started m2.fbsdata.com cluster-ip-m3 (ocf::heartbeat:IPaddr2): Started m3.fbsdata.com - On Tue, Oct 23, 2012 at 12:27 PM, Jake Smith wrote: > > - Original Message - > > > From: "Cal Heldenbrand" > > To: pacemaker@oss.clusterlabs.org > > Sent: Tuesday, October 23, 2012 11:50:11 AM > > Subject: [Pacemaker] crm_simulate a resource failure > > > Hi everyone, > > > I'm not able to find documentation or examples on this. If I have a > > cloned primitive set across a cluster, how can I simulate a failure > > of a resource on an individual node? I mainly want to see the scores > > on why a particular action is taken so I can adjust my configs. > > > I think the --op-fail parameter is what I need, but I just don't get > > the syntax of the value in the man page. > > I usually use the crm shell so I'm not positive but I think these are the > parts you need... > > $rsc_$task_$interval@$node=$rc > > $rsc = resource to test, in your case I believe you want to specify the > primitive instance of the clone i.e. p_resource:0 > $task = monitor or migrate or stop or whatever operation you want to take > $interval = the interval of a monitor task > $node = the node > $rc = the exit code you want to fail with i.e. error, not_running > > So (I think) something like: > --op-fail=$p_of_clone_resource:0_$monitor_$10@$node1=$not_running > > You *should* be able to experiment till you get it just right since its > simulate.. :-) > > HTH > > Jake > > > Thank you! > > > --Cal > > > ___ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > Project Home: http://www.clusterlabs.org > > Getting started: > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crm_simulate a resource failure
- Original Message - > From: "Cal Heldenbrand" > To: pacemaker@oss.clusterlabs.org > Sent: Tuesday, October 23, 2012 11:50:11 AM > Subject: [Pacemaker] crm_simulate a resource failure > Hi everyone, > I'm not able to find documentation or examples on this. If I have a > cloned primitive set across a cluster, how can I simulate a failure > of a resource on an individual node? I mainly want to see the scores > on why a particular action is taken so I can adjust my configs. > I think the --op-fail parameter is what I need, but I just don't get > the syntax of the value in the man page. I usually use the crm shell so I'm not positive but I think these are the parts you need... $rsc_$task_$interval@$node=$rc $rsc = resource to test, in your case I believe you want to specify the primitive instance of the clone i.e. p_resource:0 $task = monitor or migrate or stop or whatever operation you want to take $interval = the interval of a monitor task $node = the node $rc = the exit code you want to fail with i.e. error, not_running So (I think) something like: --op-fail=$p_of_clone_resource:0_$monitor_$10@$node1=$not_running You *should* be able to experiment till you get it just right since its simulate.. :-) HTH Jake > Thank you! > --Cal > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > Project Home: http://www.clusterlabs.org > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] crm_simulate a resource failure
Hi everyone, I'm not able to find documentation or examples on this. If I have a cloned primitive set across a cluster, how can I simulate a failure of a resource on an individual node? I mainly want to see the scores on why a particular action is taken so I can adjust my configs. I think the --op-fail parameter is what I need, but I just don't get the syntax of the value in the man page. Thank you! --Cal ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] "Simple" LVM/drbd backed Primary/Secondary NFS cluster doesn't always failover cleanly
- Original Message - > From: Andreas Kurz > Date: Sun, 21 Oct 2012 01:38:46 +0200 > Subject: Re: [Pacemaker] "Simple" LVM/drbd backed Primary/Secondary NFS cluster doesn't always failover cleanly > To: pacemaker@oss.clusterlabs.org > > On 10/18/2012 08:02 PM, Justin Pasher wrote: I have a pretty basic setup by most people's standards, but there must be something that is not quite right about it. Sometimes when I force a resource failover from one server to the other, the clients with the NFS mounts don't cleanly migrate to the new server. I configured this using a few different "Pacemaker-DRBD-NFS" guides out there for reference (I believe they were the Linbit guides). Are you using the latest "exportfs" resource-agent from github-repo? ... there have been bugfixes/improvements... and try to move the VIP for each export to the end of its group so the IP where the clients connect is started at the last/stopped at the first position. Regards, Andreas I'm current running the version that comes with the Debian squeeze-backports resource-agents package (1:3.9.2-5~bpo60+1). I went ahead and grabbed a copy of exportfs from the git repository. It's a little risky for me to update the file right now, since the two resources I am worried about the most are the NFS shares for the XenServer VDIs, so when it has a hiccup in the connection to the NFS server, things start exploding (e.g. guest VMs start having disk errors and go read-only). I scanned through the changes real quick and the biggest change I noticed was how the .rmtab file backup is restored (it sorts and filters unique entries instead of just concatenating the results to the end of /var/lib/nfs/rmtab). I had actually tweaked that a little bit myself before when I was trying to trace down the problem. Ultimately I think my problem is more related to the NFS server itself and how it handles "unknown" client connections after a failover. I've see people here and there mention that /var/lib/nfs should be on the replicated device to maintain consistency after fail over, but the exportfs resource agent doesn't do anything like that. Is that not actually needed anymore? At any rate, in my situation, the problem is that I am maintaining four independent NFS shares and each one can be failed over separately (and running on either server at any time), so a simple copy of the directory won't work since there is no "master" server at any given time. Also, I did find a bug in the way backup_rmtab() filters the export list for its backup. Since it looks for a leading AND trailing colon (:), it doesn't properly copy information about mounts that pulled from subdirectories under the NFS mount (e.g. instead of mounting /home, a server might mount /home/username such as with autofs, which won't get copied to the .rmtab backup). I'll file a bug report about that. Thanks. -- Justin Pasher ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Behavior of Corosync+Pacemaker with DRBD primary power loss
Hello, Under the Clusters from Scratch documentation, allow-two-primaries is set in the DRBD configuration for an active/passive cluster: http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html-single/Clusters_from_Scratch/index.html#_write_the_drbd_config "TODO: Explain the reason for the allow-two-primaries option" Is the reason for allow-two-primaries in this active/passive cluster (using ext4, a non-cluster filesystem) to allow for failover in the type of situation I have described (where the old primary/master is suddenly offline like with a power supply failure)? Are split-brains prevented because Pacemaker ensures that only one node is promoted to Primary at any time? Is it possible to recover from such a failure without allow-two-primaries? Thanks, Andrew - Original Message - From: "Andrew Martin" To: "The Pacemaker cluster resource manager" Sent: Friday, October 19, 2012 10:45:04 AM Subject: [Pacemaker] Behavior of Corosync+Pacemaker with DRBD primary power loss Hello, I have a 3 node Pacemaker + Corosync cluster with 2 "real" nodes, node0 and node1, running a DRBD resource (single-primary) and the 3rd node in standby acting as a quorum node. If node0 were running the DRBD resource, and thus is DRBD primary, and its power supply fails, will the DRBD resource be promoted to primary on node1? If I simply cut the DRBD replication link, node1 reports the following state: Role: Secondary/Unknown Disk State: UpToDate/DUnknown Connection State: WFConnection I cannot manually promote the DRBD resource because the peer is not outdated: 0: State change failed: (-7) Refusing to be Primary while peer is not outdated Command 'drbdsetup 0 primary' terminated with exit code 11 I have configured the CIB-based crm-fence-peer.sh utility in my drbd.conf fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; but I do not believe it would be applicable in this scenario. If node0 goes offline like this and doesn't come back (e.g. after a STONITH), does Pacemaker have a way to tell node1 that its peer is outdated and to proceed with promoting the resource to primary? Thanks, Andrew ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] external/ssh stonith and repeated reboots
Hi, On Tue, Oct 16, 2012 at 03:00:08PM +1100, Andrew Beekhof wrote: > On Sun, Oct 14, 2012 at 5:04 PM, James Harper > wrote: > > I'm using external/ssh in my test cluster (a bunch of vm's), and for some > > reason the cluster has tried to terminate it but failed, like: > > Try fence_xvm instead. Its actually reliable. > You'd need the fence-virtd on the host and guests package and I've had > plenty of success with the following as the config file on the host. > Make sure key_file exists everywhere, start fence-virtd and test with > "fence_xvm -o list" on the guest(s) There's also external/libvirt which should do fine. > ssh based fencing isn't just "not for production" its a flat out terrible > idea. > With much handwaving it is barely usable even for testing as it > requires the target to be alive, reachable and behaving. Indeed. Thanks, Dejan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Crm configure Error In Centos 5.4
Hi, On Mon, Oct 15, 2012 at 01:17:47PM +0530, Vinoth Narasimhan wrote: > Hi Guys, > > I just installed the latest heartbeat,pacemaker and corosync as per the > guidelines from the below link > > http://www.clusterlabs.org/wiki/Install#READ_ME_FIRST > > I am able to successfully configure the heartbeat and corosync and start it > as well. > > But when i tried to add the resource i am getting the error from python. > > [root@CHE-PSS-072 lib64]# crm configure > Traceback (most recent call last): > File "/usr/sbin/crm", line 41, in ? > crm.main.run() > File "/usr/lib64/python2.4/site-packages/crm/main.py", line 240, in run > parse_line(levels,["configure"]) > File "/usr/lib64/python2.4/site-packages/crm/main.py", line 123, in > parse_line > lvl.new_level(pt[token],token) > File "/usr/lib64/python2.4/site-packages/crm/levels.py", line 70, in > new_level > self.current_level = level_obj() > File "/usr/lib64/python2.4/site-packages/crm/ui.py", line 1295, in > __init__ > cib_factory.initialize() > File "/usr/lib64/python2.4/site-packages/crm/cibconfig.py", line 1780, in > initialize > if not self.import_cib(): > File "/usr/lib64/python2.4/site-packages/crm/cibconfig.py", line 1454, in > import_cib > self.doc,cib = read_cib(cibdump2doc) > File "/usr/lib64/python2.4/site-packages/crm/xmlutil.py", line 72, in > read_cib > doc = fun(params) > File "/usr/lib64/python2.4/site-packages/crm/xmlutil.py", line 53, in > cibdump2doc > doc = xmlparse(p.stdout) > File "/usr/lib64/python2.4/site-packages/crm/xmlutil.py", line 30, in > xmlparse > except xml.parsers.expat.ExpatError,msg: > AttributeError: 'module' object has no attribute 'expat' Hmm, the official python documentation (http://docs.python.org/library/pyexpat.html) states that expat is "New in version 2.0". Can you check your python installation. AFAIK, crmsh is being used on various EL5. Thanks, Dejan > However i am get the status correctly. > > [root@CHE-PSS-072 lib64]# crm status > > Last updated: Mon Oct 15 00:46:53 2012 > Stack: Heartbeat > Current DC: che-pss-072.ps.in (dd4d1fd0-97ff-4a28-aac2-302ee1066e4c) - > partition with quorum > Version: 1.0.12-unknown > 2 Nodes configured, unknown expected votes > 0 Resources configured. > > > Online: [ che-pss-072.ps.in ops-pss-084.ps.in ] > > Any help is greatly appreciated to solve the error. > > Thanks, > vinoth. > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker Digest, Vol 59, Issue 57
Hi, On Tue, Oct 23, 2012 at 10:18:36AM +0530, vishal kumar wrote: > Hello Florian, > > Thanks for the help. > > I tried with crm ra meta sshd lsb but it gave me the same meta-data error. > > The problem is while trying to add a lsb resource from crm shell, like > httpd as i am using rhel it gives the following error > ERROR: lsb:httpd: could not parse meta-data: > ERROR: lsb:httpd: no such resource agent > > the httpd package is installed and i have a httpd file /etc/init.d > directory. I am unable to add any resource which in /etc/init.d directory. Do you have cluster-glue installed? And is pacemaker running? lrmadmin (part of cluster-glue) gets the meta-data from lrmd. You can try it by hand like this: lrmadmin -M lsb httpd NULL Thanks, Dejan > Many thanks > Vishal > > > Message: 1 > > Date: Mon, 22 Oct 2012 17:53:23 +0530 > > From: vishal kumar > > To: pacemaker@oss.clusterlabs.org > > Subject: [Pacemaker] lsb: could not parse meta-data > > Message-ID: > > > rdksq4t-sbbf+kbnwob2fgr7xu7g6mrugbpfzasbvnp...@mail.gmail.com> > > Content-Type: text/plain; charset="iso-8859-1" > > > > Hi > > > > I am trying to configure pacemaker with corosync on RHEL 6. > > > > While trying to add lsb resources i get "could not parse meta-data". > > > > The pacemaker version is 1.1.7 . Below is the error what i get when i check > > for metadata in crm shell. > > > > *crm(live)ra# list lsb* > > *abrt-ccpp abrt-oops abrtd > > acpid atd* > > *auditd autofs certmonger > > cgconfigcgred* > > *corosynccorosync-notifydcpuspeed > > crond cups* > > *haldaemon haltheartbeat > > httpd ip6tables* > > *iptablesirqbalance kdump > > killall ktune* > > *lvm2-lvmetadlvm2-monitormatahari-broker > > matahari-host matahari-network* > > *matahari-rpcmatahari-servicematahari-sysconfig > > matahari-sysconfig-console mcelogd* > > *mdmonitor messagebus netconsole > > netfs network* > > *nfs nfslock ntpd > > ntpdate oddjobd* > > *pacemaker portreserve postfix > > psacct qpidd* > > *quota_nld rdisc restorecond > > rhnsd rhsmcertd* > > *rngdrpcbind rpcgssd > > rpcidmapd rpcsvcgssd* > > *rsyslog sandbox saslauthd > > single smartd* > > *sshdsssdsysstat > > tuned udev-post* > > *ypbind * > > *crm(live)ra# meta lsb heartbeat* > > *ERROR: heartbeat:lsb: could not parse meta-data: * > > *crm(live)ra# meta lsb sshd* > > *ERROR: sshd:lsb: could not parse meta-data: * > > *crm(live)ra# meta lsb ntpd* > > *ERROR: ntpd:lsb: could not parse meta-data: * > > *crm(live)ra# meta lsb httpd* > > *ERROR: httpd:lsb: could not parse meta-data* > > > > Please do suggest me where am i going wrong. > > Thanks for the help. > > > > Thanks > > Vishal > > -- next part -- > > An HTML attachment was scrubbed... > > URL: < > > http://oss.clusterlabs.org/pipermail/pacemaker/attachments/20121022/0ade914c/attachment-0001.html > > > > > > > -- > > > > Message: 2 > > Date: Mon, 22 Oct 2012 14:48:31 +0200 > > From: Florian Crouzat > > To: pacemaker@oss.clusterlabs.org > > Subject: Re: [Pacemaker] lsb: could not parse meta-data > > Message-ID: <5085409f@floriancrouzat.net> > > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > > > Le 22/10/2012 14:23, vishal kumar a ?crit : > > > Hi > > > > > Please do suggest me where am i going wrong. > > > Thanks for the help. > > > > > > > See: crm ra help meta > > Then try something like: crm ra meta sshd lsb # parameters order matter > > > > Anyway, you won't learn anything out of meta-datas from a LSB > > initscript, because it's just a script (not cluster oriented, not a real > > resource agent), it's not multistate, nothing like that, only > > start/stop/monitor and default mandatory settings. > > > > > > -- > > Cheers, > > Florian Crouzat > > > > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from