Re: [Pacemaker] pacemaker shutdown issue after OS change
On 31 Jan 2014, at 3:18 am, Pascal BERTON pascal.bert...@free.fr wrote: Hi ! I recently changed hosting platform versions for my PCMK clusters, from RHEL6.0 equivalent towards SL6.4. Also changed from pcmk 1.1.2+corosync 1.3.3 to pcmk 1.1.10+corosync 1.4.1 that come with SL6. Until now, I used to manage my pcmk+corosync layers directly from my own personal init scripts, so I used to globaly disable pacemaker using chkconfig. It was my scripts duty to start up cluster layer at boot and stop it at node shutdown. And it has all worked fine for a couple of years ! With that new platform, I now see something odd at node shutdown : Although both pacemaker and corosync services are disabled for every runlevel (And BTW if I disable my own service, they won’t start at boot), I can see that pacemaker’s init script is called as soon as I run the shutdown –r command. After digging into what’s happening inside, I can clearly see that although disabled, pacemaker is still called immediately, does its job of stopping the cluster and the related resources, and then when my own script runs (1st in the shutdown script list) and tries to interract with the cluster layer, too late, it has already died and its actions are not managed correctly. I can workaround that by integrating my own shutdown sequence directly within pacemaker init script, but I‘d prefer making things cleaner by maintaining them separate. Is this behavior something expected No (Or is it me ? ‘Wouldn’t be the first time… J )? What launches that disabled script ? Can I change that ? Very good question. It could be chkconfig doing something smart with the LSB metadata: # Default-Start: 2 3 4 5 # Default-Stop: 0 1 6 There was a subsequent commit to replace that with: # Default-Start: # Default-Stop: which may help as may running: /sbin/chkconfig --del pacemaker instead of /sbin/chkconfig pacemaker off Thanks for your help ! Regards, Pascal. Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection Antivirus avast! est active. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org signature.asc Description: Message signed with OpenPGP using GPGMail ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] pacemaker shutdown issue after OS change
Hi ! I recently changed hosting platform versions for my PCMK clusters, from RHEL6.0 equivalent towards SL6.4. Also changed from pcmk 1.1.2+corosync 1.3.3 to pcmk 1.1.10+corosync 1.4.1 that come with SL6. Until now, I used to manage my pcmk+corosync layers directly from my own personal init scripts, so I used to globaly disable pacemaker using chkconfig. It was my scripts duty to start up cluster layer at boot and stop it at node shutdown. And it has all worked fine for a couple of years ! With that new platform, I now see something odd at node shutdown : Although both pacemaker and corosync services are disabled for every runlevel (And BTW if I disable my own service, they won't start at boot), I can see that pacemaker's init script is called as soon as I run the shutdown -r command. After digging into what's happening inside, I can clearly see that although disabled, pacemaker is still called immediately, does its job of stopping the cluster and the related resources, and then when my own script runs (1st in the shutdown script list) and tries to interract with the cluster layer, too late, it has already died and its actions are not managed correctly. I can workaround that by integrating my own shutdown sequence directly within pacemaker init script, but I'd prefer making things cleaner by maintaining them separate. Is this behavior something expected (Or is it me ? 'Wouldn't be the first time. :) )? What launches that disabled script ? Can I change that ? Thanks for your help ! Regards, Pascal. --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker shutdown issue
Hi, On Mon, Nov 30, 2009 at 12:04:00AM -0500, Tony Bunce wrote: Hi Everyone, I'm having an issue with pacemaker and was hoping someone could point me in the right direction. I'm using pacemaker with openais on a set of NFS servers. Every time I reboot the primary I get a split brain in DRBD. From what I can tell when openais is shutting down it doesn't stop the services it is controlling so as far as DRBD is concerned it is the same as a hard shutdown. I can reproduce the problem by stopping OpenAIS (service openais stop or /etc/init.d/openais stop) and see that the controlled services (DRBD, files systems, nfs, etc.) are still running. I think this is the same exact problem: http://www.gossamer-threads.com/lists/linuxha/pacemaker/59384 Version Info: CentOS 5.4 x64 drbd83-8.3.2-6.el5_3 openais-0.80.6-8.el5_4.1 pacemaker-1.0.5-4.1 Is there something special that needs to be configured so that when openais stops it stops all of the resources? No. The sequence of events is that openais tells crmd that shutdown is pending, then crmd will try to stop all resources which are running on the node. It may happen, usually with resources which are broken for whatever reason, that the shutdown is escalated and that crmd gives up on waiting for resources to stop. At any rate, if you don't see log messages of the form lrmd.*stop.*rsc then there is probably a bug. Please make a hb_report and file a bugzilla. Thanks, Dejan Thanks for the help! -Tony ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Pacemaker shutdown issue
The upgrade should really be transparent. What problems did you encounter with nfsserver? Whenever one of the nodes takes over the nfs resource it doesn't startup the first time and gives this error: nfs_server_monitor_0 (node=nfs1, call=11, rc=2, status=complete): invalid parameter If I run this command it starts up instantly and doesn't have any problems until the service gets migrated again: crm_resource -C -r nfs_server Here is that resource from my config: primitive nfs_server ocf:heartbeat:nfsserver \ params nfs_init_script=/etc/init.d/nfs \ params nfs_notify_cmd=/sbin/rpc.statd \ params nfs_shared_infodir=/var/lib/nfs \ params nfs_ip=10.1.1.150 \ op monitor interval=30s I haven't tested yet but was going to switch from ocf:heartbeat:nfsserver to lsb:nfs to see if that fixes the problem. I also had something like this in my config: primitive drbd_r0 ocf:heartbeat:drbd \ params drbd_resource=r0 \ op monitor=30s That also gave me an error (I think it was action monitor_0 does not exist). I think that needs to be switched to this: primitive drbd_r0 ocf:linbit:drbd \ params drbd_resource=r0 op monitor interval=29s role=Master timeout=30s \ op monitor interval=30s role=Slave timeout=30s ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Pacemaker shutdown issue
That looks like a problem in the resource agent. Most probably you hit the bug 2219 which has been fixed on November 9. I applied the patch and that appears to have fixed the problem! I haven't tried a reboot yet but I can migrate between nodes without any issue. This has most probably been from the crm shell. It has been relaxed in the meantime (see bugzilla ). That's exactly the problem I was setting. I switched to the correct monitor commands (including the role) and that fixed the problem. Both the clusterlabs.org and drbd.org show the syntax without a role specified. Thanks again for the help. It looks like there is all kinds of good info in bugzilla. I'll be sure to check that out first when I run into a problem. (It doesn't look like Google or Bing index the bugzilla site) -Tony ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
[Pacemaker] Pacemaker shutdown issue
Hi Everyone, I'm having an issue with pacemaker and was hoping someone could point me in the right direction. I'm using pacemaker with openais on a set of NFS servers. Every time I reboot the primary I get a split brain in DRBD. From what I can tell when openais is shutting down it doesn't stop the services it is controlling so as far as DRBD is concerned it is the same as a hard shutdown. I can reproduce the problem by stopping OpenAIS (service openais stop or /etc/init.d/openais stop) and see that the controlled services (DRBD, files systems, nfs, etc.) are still running. I think this is the same exact problem: http://www.gossamer-threads.com/lists/linuxha/pacemaker/59384 Version Info: CentOS 5.4 x64 drbd83-8.3.2-6.el5_3 openais-0.80.6-8.el5_4.1 pacemaker-1.0.5-4.1 Is there something special that needs to be configured so that when openais stops it stops all of the resources? Thanks for the help! -Tony ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker