Re: [Pacemaker] pacemaker shutdown issue after OS change

2014-02-16 Thread Andrew Beekhof

On 31 Jan 2014, at 3:18 am, Pascal BERTON pascal.bert...@free.fr wrote:

 Hi !
  
 I recently changed hosting platform versions for my PCMK clusters, from 
 RHEL6.0 equivalent towards SL6.4. Also changed from pcmk 1.1.2+corosync 1.3.3 
  to pcmk 1.1.10+corosync 1.4.1 that come with SL6.
 Until now, I used to manage my pcmk+corosync layers directly from my own 
 personal init scripts, so I used to globaly disable pacemaker using 
 chkconfig. It was my scripts duty to start up cluster layer at boot and stop 
 it at node shutdown. And it has all worked fine for a couple of years !
 With that new platform, I now see something odd at node shutdown : Although 
 both pacemaker and corosync services are disabled for every runlevel (And BTW 
 if I disable my own service, they won’t start at boot), I can see that 
 pacemaker’s init script is called as soon as I run the shutdown –r command.
 After digging into what’s happening inside, I can clearly see that although 
 disabled, pacemaker is still called immediately, does its job of stopping the 
 cluster and the related resources, and then when my own script runs (1st in 
 the shutdown script list) and tries to interract with the cluster layer, too 
 late, it has already died and its actions are not managed correctly.
  
 I can workaround that by integrating my own shutdown sequence directly within 
 pacemaker init script, but I‘d prefer making things cleaner by maintaining 
 them separate.
  
 Is this behavior something expected

No

 (Or is it me ? ‘Wouldn’t be the first time… J )? What launches that disabled 
 script ? Can I change that ?

Very good question.
It could be chkconfig doing something smart with the LSB metadata:

# Default-Start:  2 3 4 5
# Default-Stop:   0 1 6

There was a subsequent commit to replace that with:

# Default-Start:
# Default-Stop:

which may help as may running:

/sbin/chkconfig --del pacemaker

instead of

/sbin/chkconfig pacemaker off


  
 Thanks for your help !
  
 Regards,
  
 Pascal.
  
  
 
 
   
 Ce courrier électronique ne contient aucun virus ou logiciel malveillant 
 parce que la protection Antivirus avast! est active.
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker shutdown issue

2009-11-30 Thread Dejan Muhamedagic
Hi,

On Mon, Nov 30, 2009 at 12:04:00AM -0500, Tony Bunce wrote:
 Hi Everyone,
 
 I'm having an issue with pacemaker and was hoping someone could point me in 
 the right direction.
 
 I'm using pacemaker with openais on a set of NFS servers.  Every time I 
 reboot the primary I get a split brain in DRBD.
 
 From what I can tell when openais is shutting down it doesn't stop the 
 services it is controlling so as far as DRBD is concerned it is the same as a 
 hard shutdown.
 
 I can reproduce the problem by stopping OpenAIS (service openais stop or 
 /etc/init.d/openais stop) and see that the controlled services (DRBD, files 
 systems, nfs, etc.) are still running.
 
 I think this is the same exact problem:
 http://www.gossamer-threads.com/lists/linuxha/pacemaker/59384
 
 Version Info:
 CentOS 5.4 x64
 drbd83-8.3.2-6.el5_3
 openais-0.80.6-8.el5_4.1
 pacemaker-1.0.5-4.1
 
 Is there something special that needs to be configured so that when openais
 stops it stops all of the resources?

No. The sequence of events is that openais tells crmd that shutdown
is pending, then crmd will try to stop all resources which are running
on the node. It may happen, usually with resources which are broken for
whatever reason, that the shutdown is escalated and that crmd gives up
on waiting for resources to stop. At any rate, if you don't see log
messages of the form lrmd.*stop.*rsc then there is probably a bug.
Please make a hb_report and file a bugzilla.

Thanks,

Dejan

 Thanks for the help!
 
 -Tony

 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Pacemaker shutdown issue

2009-11-30 Thread Tony Bunce
The upgrade should really be transparent. What problems did you
encounter with nfsserver?

Whenever one of the nodes takes over the nfs resource it doesn't startup the 
first time and gives this error:

nfs_server_monitor_0 (node=nfs1, call=11, rc=2, status=complete): invalid 
parameter

If I run this command it starts up instantly and doesn't have any problems 
until the service gets migrated again:
crm_resource -C -r nfs_server


Here is that resource from my config:
primitive nfs_server ocf:heartbeat:nfsserver \
params nfs_init_script=/etc/init.d/nfs \
params nfs_notify_cmd=/sbin/rpc.statd \
params nfs_shared_infodir=/var/lib/nfs \
params nfs_ip=10.1.1.150 \
op monitor interval=30s


I haven't tested yet but was going to switch from ocf:heartbeat:nfsserver to 
lsb:nfs to see if that fixes the problem.


I also had something like this in my config:
primitive drbd_r0 ocf:heartbeat:drbd \
params drbd_resource=r0 \
op monitor=30s

That also gave me an error (I think it was action monitor_0 does not exist).

I think that needs to be switched to this:
primitive drbd_r0 ocf:linbit:drbd \
  params drbd_resource=r0
op monitor interval=29s role=Master timeout=30s \
  op monitor interval=30s role=Slave timeout=30s

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Pacemaker shutdown issue

2009-11-30 Thread Tony Bunce
That looks like a problem in the resource agent. Most probably
you hit the bug 2219 which has been fixed on November 9.

I applied the patch and that appears to have fixed the problem!  I haven't 
tried a reboot yet but I can migrate between nodes without any issue.


This has most probably been from the crm shell. It has been relaxed
in the meantime (see bugzilla ).

That's exactly the problem I was setting.  I switched to the correct monitor 
commands (including the role) and that fixed the problem.
Both the clusterlabs.org and drbd.org show the  syntax without a role specified.

Thanks again for the help.  It looks like there is all kinds of good info in 
bugzilla.  I'll be sure to check that out first when I run into a problem.  (It 
doesn't look like Google or Bing index the bugzilla site)

-Tony

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker