Re: [Pacemaker] Route OCF RA and Failover IP
Florian, Thanks for the input; for the time being I have it running. I have commented out the validation for source for now. On the stanby node, the gateway is reachable as there is an IP address on eth0 in the same subnet as the VIP (eth0:0; however not active yet), so it is not failing on the gateway. I have specificed a source address that is not yet active on the standby and that is where it is failing. # If a source address has been configured, is it available on this system? #if [ -n ${OCF_RESKEY_source} ]; then # if ! ip address show | grep -w ${OCF_RESKEY_source} /dev/null 21; then # ocf_log error Source address ${OCF_RESKEY_source} appears not to be available on this system. # # same reason as with _device: # return $OCF_ERR_INSTALLED # fi # fi Thank you for your time on this matter Billy ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
[Pacemaker] Contraining clones per node
Hi everyone, i have been into heartbeat 2 and pacemaker for some time now and wonder wheather i can use it in more than just the normal HA situation. However going through the excelent Pacemaker Configuration Explained or the Linux HA Cluster book by Michael Schwarzkopff, i still have no idea how to configure pacemaker for my scenario. My environment consists of multiple servers (~40), each with one or more cpu-cores. I have two application-types called A and B (services like eg. apache), that each use one cpu core. A is mission critical, B is optional. So what i want to express is that there should be 20 A's and the remaining cpu's may be used by B's. When a node executing A's fails, it is perfectly ok to shut down B's to make cpu cores available for A's to be started. Any idea how to do this? Going through the various examples in the book and pdf, i found examples on how to use instance-attributes for one resource. That mean things like start apache only on host with more than XY ram or MN cpu speed. However, in my scenario i thing i need contraints that invole the number of resources on the host. An example would be the sum of A's and B's started on node must be less or equal the number of cpu cores. But even going through the parameters supplied to ocf-agent (page 72 in the pacemaker explained pdf), it seem i am unable to figure out how many clones a currently runs. Is pacemaker able to handle such constraints? Is there some work-around (eg with score-values) to emulate such behavior? any ideas/hints/comments are very welcome. best regards, Jens Bräuer___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Contraining clones per node
Am Montag, 30. November 2009 14:07:23 schrieb jens.brae...@rohde-schwarz.com: Hi everyone, i have been into heartbeat 2 and pacemaker for some time now and wonder wheather i can use it in more than just the normal HA situation. However going through the excelent Pacemaker Configuration Explained or the Linux HA Cluster book by Michael Schwarzkopff, i still have no idea how to configure pacemaker for my scenario. Thanks ;-) My environment consists of multiple servers (~40), each with one or more cpu-cores. I have two application-types called A and B (services like eg. apache), that each use one cpu core. A is mission critical, B is optional. So what i want to express is that there should be 20 A's and the remaining cpu's may be used by B's. When a node executing A's fails, it is perfectly ok to shut down B's to make cpu cores available for A's to be started. Any idea how to do this? In pacemaker resources have a meta_attribute priority. If there are not enough nodes available ton run all resources the resources with higher priority are run. so make a clone of to start 20 times A. Resource A has a priority of 20. Make a clone of B with B having a priority of 10. Going through the various examples in the book and pdf, i found examples on how to use instance-attributes for one resource. That mean things like start apache only on host with more than XY ram or MN cpu speed. However, in my scenario i thing i need contraints that invole the number of resources on the host. An example would be the sum of A's and B's started on node must be less or equal the number of cpu cores. But even going through the parameters supplied to ocf-agent (page 72 in the pacemaker explained pdf), it seem i am unable to figure out how many clones a currently runs. Resource allocation is a feature of the next verison. As far as I know it does not work up to now. At least it is not well tested. Is pacemaker able to handle such constraints? Is there some work-around (eg with score-values) to emulate such behavior? See prio above. any ideas/hints/comments are very welcome. best regards, Jens Bräuer Greetings to Munich. -- Dr. Michael Schwartzkopff MultiNET Services GmbH Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany Tel: +49 - 89 - 45 69 11 0 Fax: +49 - 89 - 45 69 11 21 mob: +49 - 174 - 343 28 75 mail: mi...@multinet.de web: www.multinet.de Sitz der Gesellschaft: 85630 Grasbrunn Registergericht: Amtsgericht München HRB 114375 Geschäftsführer: Günter Jurgeneit, Hubert Martens --- PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B Skype: misch42 ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Pacemaker shutdown issue
Hi, On Mon, Nov 30, 2009 at 12:04:00AM -0500, Tony Bunce wrote: Hi Everyone, I'm having an issue with pacemaker and was hoping someone could point me in the right direction. I'm using pacemaker with openais on a set of NFS servers. Every time I reboot the primary I get a split brain in DRBD. From what I can tell when openais is shutting down it doesn't stop the services it is controlling so as far as DRBD is concerned it is the same as a hard shutdown. I can reproduce the problem by stopping OpenAIS (service openais stop or /etc/init.d/openais stop) and see that the controlled services (DRBD, files systems, nfs, etc.) are still running. I think this is the same exact problem: http://www.gossamer-threads.com/lists/linuxha/pacemaker/59384 Version Info: CentOS 5.4 x64 drbd83-8.3.2-6.el5_3 openais-0.80.6-8.el5_4.1 pacemaker-1.0.5-4.1 Is there something special that needs to be configured so that when openais stops it stops all of the resources? No. The sequence of events is that openais tells crmd that shutdown is pending, then crmd will try to stop all resources which are running on the node. It may happen, usually with resources which are broken for whatever reason, that the shutdown is escalated and that crmd gives up on waiting for resources to stop. At any rate, if you don't see log messages of the form lrmd.*stop.*rsc then there is probably a bug. Please make a hb_report and file a bugzilla. Thanks, Dejan Thanks for the help! -Tony ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] logging related information- pacemaker
Hi, On Sat, Nov 28, 2009 at 08:45:26PM -0500, Shravan Mishra wrote: Hi, I'm using pacemaker and trying to configure logging for various subsytems like pengine, attrd, crmd etc. On starting corosync the only logs I see are for e.g [...] Nothing related to stonithd or crmd etc. I have started /usr/lib64/heartbeat/ha_logd -d. Under /etc/ha.d/shellfuncs I see variables which I have exported on the command line: HA_LOGD=yes HA_LOGFILE=/tmp/corosync.log Am I taking a completely wrong path, am I supposed to configure just using corosync.conf and use logger_subsys for the above mentioned subsystems? In corosync.conf, you should set use_logd: yes in the service section, then specify the syslog facility in /etc/logd.cf. A bit confusing, but openais/corosync and ha_logd have different configuration files. Thanks, Dejan Sincerely Shravan ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] is ptest 1.06 working correctly?
On Mon, November 30, 2009 5:00 pm, Frank DiMeo wrote: I ran the command: ptest -live-check - -save-graph tmp.graph -save-dotfile tmp.dot you need -- instead of - in your long option names. Only - should have one - Rasto -- : Dipl-Ing Rastislav Levrinc : DRBD-MC http://www.drbd.org/mc/management-console/ : DRBD/HA support and consulting http://www.linbit.com/ DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] crm_mon not refreshing
No. I was using the command line utility from the terminal. I've been told that crm_mon is now event driven and will only refresh as such. Wouldn't that invalidate the interval option, since this is not working either. From an operations perspective, I wanted to have a dedicated terminal window with crm_mon's output displayed on screen. I would have liked the screen to refresh as it used to do. Kind Regards --- Lester Joseph Linux Systems Administrator From: Frank DiMeo [mailto:frank.di...@bigbandnet.com] Sent: Monday, November 30, 2009 3:46 PM To: Joseph, Lester Subject: RE: [Pacemaker] crm_mon not refreshing Are you using the web interface to crm_mon? -Frank From: Joseph, Lester [mailto:lester.jos...@galacoral.com] Sent: Monday, November 30, 2009 10:10 AM To: 'pacemaker@oss.clusterlabs.org' Subject: [Pacemaker] crm_mon not refreshing Hi, I have pacemaker 1.0.6 running with heartbeat 3.0.1. Noticed that crm_mon is not refreshing anymore, even when I specify the interval. Has this been removed? Please advise? Kind Regards Lester Joseph Linux Systems Administrator - - Gala Coral E-Commerce Eurobet House 10-24 Church Street West Woking Surrey GU21 6HT T: +44 (0)1483 766766 M: +44 (0)7867 554267 F: +44 (0)1483 722141 E: lester.jos...@galacoral.com This email has been sent from Gala Coral Group Limited (GCG) or a subsidiary or associated company. GCG is registered in England with company number 4639005. Registered office address: 71 Queensway, London W2 4QH, United Kingdom; website: www.galacoral.com. This e-mail message (and any attachments) is confidential and may contain privileged and/or proprietorial information protected by legal rules. It is for use by the intended addressee only. If you believe you are not the intended recipient or that the sender is not authorised to send you the email, please return it to the sender (and please copy it to h...@galacoral.com) and then delete it from your computer. You should not otherwise copy or disclose its contents to anyone. Except where this email is sent in the usual course of business, the views expressed are those of the sender and not necessarily ours. We reserve the right to monitor all emails sent to and from our businesses, to protect the businesses and to ensure compliance with internal policies. Emails are not secure and cannot be guaranteed to be error-free, as they can be intercepted, amended, lost or destroyed, and may contain viruses; anyone who communicates with us by email is taken to accept these risks. GCG accepts no liability for any loss or damage which may be caused by software viruses. This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk This email has been sent from Gala Coral Group Limited (GCG) or a subsidiary or associated company. GCG is registered in England with company number 4639005. Registered office address: 71 Queensway, London W2 4QH, United Kingdom; website: www.galacoral.com. This e-mail message (and any attachments) is confidential and may contain privileged and/or proprietorial information protected by legal rules. It is for use by the intended addressee only. If you believe you are not the intended recipient or that the sender is not authorised to send you the email, please return it to the sender (and please copy it to h...@galacoral.com) and then delete it from your computer. You should not otherwise copy or disclose its contents to anyone. Except where this email is sent in the usual course of business, the views expressed are those of the sender and not necessarily ours. We reserve the right to monitor all emails sent to and from our businesses, to protect the businesses and to ensure compliance with internal policies. Emails are not secure and cannot be guaranteed to be error-free, as they can be intercepted, amended, lost or destroyed, and may contain viruses; anyone who communicates with us by email is taken to accept these risks. GCG accepts no liability for any loss or damage which may be caused by software viruses. inline: image001.gif___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] is ptest 1.06 working correctly?
I actually did use -- on the long options, for some reason the cut/paste in MS outlook collapsed them. As you see from the enclosed files in my previous posting, the files are actually generated, there's just not much in them. -Frank -Original Message- From: Rasto Levrinc [mailto:rasto.levr...@linbit.com] Sent: Monday, November 30, 2009 11:08 AM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] is ptest 1.06 working correctly? On Mon, November 30, 2009 5:00 pm, Frank DiMeo wrote: I ran the command: ptest -live-check - -save-graph tmp.graph -save-dotfile tmp.dot you need -- instead of - in your long option names. Only - should have one - Rasto -- : Dipl-Ing Rastislav Levrinc : DRBD-MC http://www.drbd.org/mc/management-console/ : DRBD/HA support and consulting http://www.linbit.com/ DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] is ptest 1.06 working correctly?
On Mon, November 30, 2009 5:21 pm, Frank DiMeo wrote: I actually did use -- on the long options, for some reason the cut/paste in MS outlook collapsed them. As you see from the enclosed files in my previous posting, the files are actually generated, there's just not much in them. Oh, I see. It is because you don't have any transitions in live cib. It works correctly as far as I can tell. Rasto -- : Dipl-Ing Rastislav Levrinc : DRBD-MC http://www.drbd.org/mc/management-console/ : DRBD/HA support and consulting http://www.linbit.com/ DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] is ptest 1.06 working correctly?
So ptest can analyze transitions that have already happened on a live node? I thought it could analyze the configuration and predict behavior. I suppose that's not correct? -Frank -Original Message- From: Rasto Levrinc [mailto:rasto.levr...@linbit.com] Sent: Monday, November 30, 2009 11:38 AM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] is ptest 1.06 working correctly? On Mon, November 30, 2009 5:21 pm, Frank DiMeo wrote: I actually did use -- on the long options, for some reason the cut/paste in MS outlook collapsed them. As you see from the enclosed files in my previous posting, the files are actually generated, there's just not much in them. Oh, I see. It is because you don't have any transitions in live cib. It works correctly as far as I can tell. Rasto -- : Dipl-Ing Rastislav Levrinc : DRBD-MC http://www.drbd.org/mc/management-console/ : DRBD/HA support and consulting http://www.linbit.com/ DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] is ptest 1.06 working correctly?
Actually, I don't know what you mean by the phrase you don't have any transitions in live cib. Shouldn't ptest generate a graphical representation of the actions to be carried out on resources? -Frank -Original Message- From: Rasto Levrinc [mailto:rasto.levr...@linbit.com] Sent: Monday, November 30, 2009 11:38 AM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] is ptest 1.06 working correctly? On Mon, November 30, 2009 5:21 pm, Frank DiMeo wrote: I actually did use -- on the long options, for some reason the cut/paste in MS outlook collapsed them. As you see from the enclosed files in my previous posting, the files are actually generated, there's just not much in them. Oh, I see. It is because you don't have any transitions in live cib. It works correctly as far as I can tell. Rasto -- : Dipl-Ing Rastislav Levrinc : DRBD-MC http://www.drbd.org/mc/management-console/ : DRBD/HA support and consulting http://www.linbit.com/ DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Pacemaker shutdown issue
The upgrade should really be transparent. What problems did you encounter with nfsserver? Whenever one of the nodes takes over the nfs resource it doesn't startup the first time and gives this error: nfs_server_monitor_0 (node=nfs1, call=11, rc=2, status=complete): invalid parameter If I run this command it starts up instantly and doesn't have any problems until the service gets migrated again: crm_resource -C -r nfs_server Here is that resource from my config: primitive nfs_server ocf:heartbeat:nfsserver \ params nfs_init_script=/etc/init.d/nfs \ params nfs_notify_cmd=/sbin/rpc.statd \ params nfs_shared_infodir=/var/lib/nfs \ params nfs_ip=10.1.1.150 \ op monitor interval=30s I haven't tested yet but was going to switch from ocf:heartbeat:nfsserver to lsb:nfs to see if that fixes the problem. I also had something like this in my config: primitive drbd_r0 ocf:heartbeat:drbd \ params drbd_resource=r0 \ op monitor=30s That also gave me an error (I think it was action monitor_0 does not exist). I think that needs to be switched to this: primitive drbd_r0 ocf:linbit:drbd \ params drbd_resource=r0 op monitor interval=29s role=Master timeout=30s \ op monitor interval=30s role=Slave timeout=30s ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] is ptest 1.06 working correctly?
I've never really understood the correct time to do the ptest graphs. I initiated a failover once and did the graph very quickly while it was in a transitional state but I've always wondered if there is an easier way i.e. show me a graph of the migration plan if such and such were to happen. -Original Message- From: Frank DiMeo [mailto:frank.di...@bigbandnet.com] Sent: 30 November 2009 16:56 To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] is ptest 1.06 working correctly? Actually, I don't know what you mean by the phrase you don't have any transitions in live cib. Shouldn't ptest generate a graphical representation of the actions to be carried out on resources? -Frank -Original Message- From: Rasto Levrinc [mailto:rasto.levr...@linbit.com] Sent: Monday, November 30, 2009 11:38 AM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] is ptest 1.06 working correctly? On Mon, November 30, 2009 5:21 pm, Frank DiMeo wrote: I actually did use -- on the long options, for some reason the cut/paste in MS outlook collapsed them. As you see from the enclosed files in my previous posting, the files are actually generated, there's just not much in them. Oh, I see. It is because you don't have any transitions in live cib. It works correctly as far as I can tell. Rasto -- : Dipl-Ing Rastislav Levrinc : DRBD-MC http://www.drbd.org/mc/management-console/ : DRBD/HA support and consulting http://www.linbit.com/ DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] is ptest 1.06 working correctly?
This sounds very interesting. I look forward to trying it :) (sorry for Outlook-affliction) -Original Message- From: Dejan Muhamedagic [mailto:deja...@fastmail.fm] Sent: 30 November 2009 17:28 To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] is ptest 1.06 working correctly? Hi, On Mon, Nov 30, 2009 at 05:04:35PM -, darren.mans...@opengi.co.uk wrote: I've never really understood the correct time to do the ptest graphs. I initiated a failover once and did the graph very quickly while it was in a transitional state but I've always wondered if there is an easier way i.e. show me a graph of the migration plan if such and such were to happen. There's a fairly new feature in the crm shell with which it is possible to edit the status section, e.g. to simulate a resource failure or the node lost event. Then you can try the ptest command (in configure) and it will show you what would happen. This feature has not been complete at the time when 1.0.6 was released and may still change. Also, if you change the configuration and run ptest _before_ commit, that will also display the graph of what would happen if the new configuration had been committed. Thanks, Dejan -Original Message- From: Frank DiMeo [mailto:frank.di...@bigbandnet.com] Sent: 30 November 2009 16:56 To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] is ptest 1.06 working correctly? Actually, I don't know what you mean by the phrase you don't have any transitions in live cib. Shouldn't ptest generate a graphical representation of the actions to be carried out on resources? -Frank -Original Message- From: Rasto Levrinc [mailto:rasto.levr...@linbit.com] Sent: Monday, November 30, 2009 11:38 AM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] is ptest 1.06 working correctly? On Mon, November 30, 2009 5:21 pm, Frank DiMeo wrote: I actually did use -- on the long options, for some reason the cut/paste in MS outlook collapsed them. As you see from the enclosed files in my previous posting, the files are actually generated, there's just not much in them. Oh, I see. It is because you don't have any transitions in live cib. It works correctly as far as I can tell. Rasto -- : Dipl-Ing Rastislav Levrinc : DRBD-MC http://www.drbd.org/mc/management-console/ : DRBD/HA support and consulting http://www.linbit.com/ DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
[Pacemaker] Vmware Stonith device plugin which uses VMware VC
Hi, Has any one written or come across a stonith plugin for VMware that supports Virtual Center? I have a few nodes which are all VMware virtual machines in a clustered environment. I have painfully searched for a stonith plugin to use with this nodes. I have been researching the possibility of creating my own using the community provided perl scripts that are included in the VMware-vSphere-SDK-for-Perl toolkit. The script vmcontrol.pl in this toolkit looks like it can do what we want, however, we need to incorporate this in a stonith plugin. Operation of the vmcontrol.pl script. Operation to be performed. One of the following: poweron (power on one or more virtual machines), poweroff (power off one or more virtual machines), suspend (suspend one or more virtual machines), reboot (reboot one or more guests), reset (reset one or more virtual machines), shutdown (shutdown one or more guests), standby (set to standby mode one or guests). Can anyone advise? Kind Regards Lester Joseph Linux Systems Administrator - - Gala Coral E-Commerce Eurobet House 10-24 Church Street West Woking Surrey GU21 6HT T: +44 (0)1483 766766 M: +44 (0)7867 554267 F: +44 (0)1483 722141 E: lester.jos...@galacoral.com This email has been sent from Gala Coral Group Limited (GCG) or a subsidiary or associated company. GCG is registered in England with company number 4639005. Registered office address: 71 Queensway, London W2 4QH, United Kingdom; website: www.galacoral.com. This e-mail message (and any attachments) is confidential and may contain privileged and/or proprietorial information protected by legal rules. It is for use by the intended addressee only. If you believe you are not the intended recipient or that the sender is not authorised to send you the email, please return it to the sender (and please copy it to h...@galacoral.com) and then delete it from your computer. You should not otherwise copy or disclose its contents to anyone. Except where this email is sent in the usual course of business, the views expressed are those of the sender and not necessarily ours. We reserve the right to monitor all emails sent to and from our businesses, to protect the businesses and to ensure compliance with internal policies. Emails are not secure and cannot be guaranteed to be error-free, as they can be intercepted, amended, lost or destroyed, and may contain viruses; anyone who communicates with us by email is taken to accept these risks. GCG accepts no liability for any loss or damage which may be caused by software viruses. inline: image001.gif___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Debian packages, OCFS2, high CPU load
* Stefan Förster cite+pacema...@incertum.net: * Dejan Muhamedagic deja...@fastmail.fm: On Fri, Nov 27, 2009 at 01:05:41PM +0100, Stefan Förster wrote: With Debian, apart from some minor glitches (path to controld.pcmk, old udev, old kernel) everything went well, but as soon as I commit the configuration containing the O2CB resources, both nodes become unresponsive, cluster communication fails and corosync (which was started as aisexec) is at about 100% CPU. corosync runs as corosync. aisexec is from the older openais (0.8x). With the Debian packages from http://people.debian.org/~madkiss/ha/, openais contains /usr/sbin/aisexec, which is a shellscript calling: export COROSYNC_DEFAULT_CONFIG_IFACE=openaisserviceenableexperimental:corosync_parser corosync $@ The Debian openais package also contains /usr/lib/lcrso/service_ckpt.lcrso which isn't loaded without the above environemnt settings. Amongst others, it contains: /usr/lib/lcrso/service_msg.lcrso /usr/lib/lcrso/service_lck.lcrso /usr/lib/lcrso/service_clm.lcrso /usr/lib/lcrso/service_evt.lcrso /usr/lib/lcrso/openaisserviceenable.lcrso /usr/lib/lcrso/service_ckpt.lcrso /usr/lib/lcrso/service_amf.lcrso /usr/lib/lcrso/service_tmr.lcrso Otherwise, perhaps you found a bug. See if it's reproducible without o2cb. I'm unsure on how to do this. Perhaps simply using another service which relies on CKPT would trigger that bug? I could reproduce the problem: The behaivour arises as soon as Pacemaker stops DLM for the first time - it seems it's not related to o2cb at all. As soon as the DLM resource is stopped, the CPU usage of corosync is at 100%. Anything else I can do to aid in debugging this? Cheers Stefan ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Pacemaker shutdown issue
That looks like a problem in the resource agent. Most probably you hit the bug 2219 which has been fixed on November 9. I applied the patch and that appears to have fixed the problem! I haven't tried a reboot yet but I can migrate between nodes without any issue. This has most probably been from the crm shell. It has been relaxed in the meantime (see bugzilla ). That's exactly the problem I was setting. I switched to the correct monitor commands (including the role) and that fixed the problem. Both the clusterlabs.org and drbd.org show the syntax without a role specified. Thanks again for the help. It looks like there is all kinds of good info in bugzilla. I'll be sure to check that out first when I run into a problem. (It doesn't look like Google or Bing index the bugzilla site) -Tony ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
[Pacemaker] Possible bug: Strange behaviour of cibadmin -D
Hi, I start with a fresh cluster, no cib. 1) Add a #health attribute and verify that is it i nthe CIB: # attrd_update -n #health-smart -U red -d 1s # cibadmin -Q | grep health nvpair id=status-... name=#health-smart value=red/ 2) So far so good. I delete the attribute. Since this is a virtual machine with limited access I have to do the following: # cibadmin -Q | grep health health.cib # cibadmin -D -x health.cib # cibadmin -Q | grep health - nothing, entry gone. So far so good. 3) Now I want to write my attribute again: # attrd_updater -n #health-smart -U red -d 1s # cibamin -Q | grep health - nothing. This is NOT ok. Somehow the CIB does not accept any #health-smart attributed any more. 4) Strange, but OK. I try to delete my whole CIB to be able to start again: # cibadmin -E --force # cibadmin -Q | grep health nvpair id=status-... name=#health-smart value=red/ Here Is my attribute again! After an erasure if the CIB. How could this be? If this a bug? Should I file it? Or I am just too stupid to use the command line? Greetings, -- Dr. Michael Schwartzkopff MultiNET Services GmbH Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany Tel: +49 - 89 - 45 69 11 0 Fax: +49 - 89 - 45 69 11 21 mob: +49 - 174 - 343 28 75 mail: mi...@multinet.de web: www.multinet.de Sitz der Gesellschaft: 85630 Grasbrunn Registergericht: Amtsgericht München HRB 114375 Geschäftsführer: Günter Jurgeneit, Hubert Martens --- PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B Skype: misch42 ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] logging related information- pacemaker
Thanks a lot. On Mon, Nov 30, 2009 at 10:42 AM, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi, On Sat, Nov 28, 2009 at 08:45:26PM -0500, Shravan Mishra wrote: Hi, I'm using pacemaker and trying to configure logging for various subsytems like pengine, attrd, crmd etc. On starting corosync the only logs I see are for e.g [...] Nothing related to stonithd or crmd etc. I have started /usr/lib64/heartbeat/ha_logd -d. Under /etc/ha.d/shellfuncs I see variables which I have exported on the command line: HA_LOGD=yes HA_LOGFILE=/tmp/corosync.log Am I taking a completely wrong path, am I supposed to configure just using corosync.conf and use logger_subsys for the above mentioned subsystems? In corosync.conf, you should set use_logd: yes in the service section, then specify the syslog facility in /etc/logd.cf. A bit confusing, but openais/corosync and ha_logd have different configuration files. Thanks, Dejan Sincerely Shravan ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Node crash when 'ifdown eth0'
On 12/1/2009 at 11:05 AM, hj lee kerd...@gmail.com wrote: On Fri, Nov 27, 2009 at 3:05 PM, Steven Dake sd...@redhat.com wrote: On Fri, 2009-11-27 at 11:32 -0200, Mark Horton wrote: I'm using pacemaker 1.0.6 and corosync 1.1.2 (not using openais) with centos 5.4. The packages are from here: http://www.clusterlabs.org/rpm/epel-5/ Mark On Fri, Nov 27, 2009 at 9:01 AM, Oscar Remí-rez de Ganuza Satrústegui oscar...@unav.es wrote: Good morning, We are testing a cluster configuration on RHEL5 (x86_64) with pacemaker 1.0.5 and openais (0.80.5). Two node cluster, active-passive, with the following resources: Mysql service resource and a NFS filesystem resource (shared storage in a SAN). In our tests, when we bring down the network interface (ifdown eth0), the What is the use case for ifdown eth0 (ie what are you trying to verify)? I have the same test case. In my case, when two nodes cluster is disconnect, I want to see split-brain. And then I want to see the split-brain handler resets one of nodes. What I want to verify is that the cluster will recover network disconnection and split-brain situation. Try this, on one node: # iptables -A INPUT -s ip.of.other.node -j DROP # iptables -A OUTPUT -d ip.of.other.node -j DROP HTH, Tim -- Tim Serong tser...@novell.com Senior Clustering Engineer, Novell Inc. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Node crash when 'ifdown eth0'
On Mon, 2009-11-30 at 17:05 -0700, hj lee wrote: On Fri, Nov 27, 2009 at 3:05 PM, Steven Dake sd...@redhat.com wrote: On Fri, 2009-11-27 at 11:32 -0200, Mark Horton wrote: I'm using pacemaker 1.0.6 and corosync 1.1.2 (not using openais) with centos 5.4. The packages are from here: http://www.clusterlabs.org/rpm/epel-5/ Mark On Fri, Nov 27, 2009 at 9:01 AM, Oscar Remírez de Ganuza Satrústegui oscar...@unav.es wrote: Good morning, We are testing a cluster configuration on RHEL5 (x86_64) with pacemaker 1.0.5 and openais (0.80.5). Two node cluster, active-passive, with the following resources: Mysql service resource and a NFS filesystem resource (shared storage in a SAN). In our tests, when we bring down the network interface (ifdown eth0), the What is the use case for ifdown eth0 (ie what are you trying to verify)? I have the same test case. In my case, when two nodes cluster is disconnect, I want to see split-brain. And then I want to see the split-brain handler resets one of nodes. What I want to verify is that the cluster will recover network disconnection and split-brain situation. ifconfig eth0 down is a totally different then testing if there is a node disconnection. When corosync detects eth0 being taken down, it binds to the interface 127.0.0.1. This is probably not what you had in mind when you wanted to test split brain. Keep in mind an interface taken out of service is different then an interface failing from a posix api perspective. What you really want to test is pulling the network cable between the machines. Regards -steve Thanks hj ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Node crash when 'ifdown eth0'
Good morning, Dejan Muhamedagic escribió: Hi, On Fri, Nov 27, 2009 at 12:01:17PM +0100, Oscar Remírez de Ganuza Satrústegui wrote: In our tests, when we bring down the network interface (ifdown eth0), the openais service (aisexec process) and other processes Yes, openais gets nervous if the network interface disappears. I think you'll find a core dump in /var/lib/openais. At any rate, better make sure that the interface stays up. And don't use dhcp but static addresses. Ok, we were just checking different conditions. We use static addresses anyway. (stonithd, cib, attrd and crmd) crash, and just some processes are still running: [r...@herculespre ~]# ps -fea |grep ais\|heartbeat root 2343 2335 0 Nov26 pts/000:00:18 /usr/lib64/heartbeat/lrmd 102 2345 2335 0 Nov26 pts/000:00:01 /usr/lib64/heartbeat/pengine Processes which are not talking to aisexec. Thanks, Dejan Thank you very much for the information! I will test our configuration too with the rpm that Mark told us. http://www.clusterlabs.org/rpm/epel-5/ Thanks again! Regards, --- Oscar Remírez de Ganuza Servicios Informáticos Universidad de Navarra Ed. de Derecho, Campus Universitario 31080 Pamplona (Navarra), Spain tfno: +34 948 425600 Ext. 3130 http://www.unav.es/SI smime.p7s Description: S/MIME Cryptographic Signature ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Node crash when 'ifdown eth0'
Hi, Steven Dake escribió: On Mon, 2009-11-30 at 17:05 -0700, hj lee wrote: On Fri, Nov 27, 2009 at 3:05 PM, Steven Dake sd...@redhat.com wrote: On Fri, Nov 27, 2009 at 9:01 AM, Oscar Remírez de Ganuza Satrústegui oscar...@unav.es wrote: In our tests, when we bring down the network interface (ifdown eth0), the What is the use case for ifdown eth0 (ie what are you trying to verify)? I have the same test case. In my case, when two nodes cluster is disconnect, I want to see split-brain. And then I want to see the split-brain handler resets one of nodes. What I want to verify is that the cluster will recover network disconnection and split-brain situation. ifconfig eth0 down is a totally different then testing if there is a node disconnection. When corosync detects eth0 being taken down, it binds to the interface 127.0.0.1. This is probably not what you had in mind when you wanted to test split brain. Keep in mind an interface taken out of service is different then an interface failing from a posix api perspective. What you really want to test is pulling the network cable between the machines. I wanted to test the split-brain situation too and the recovery from it. I also wanted to test a pingd resource and location we also have configured to see it the node put down the resources correctly when it detects no connection to the gateway. Anyway, I have checked this situation and configuration successfully pulling the network cable from virtualcenter, but i got worried finding out that openais crashed and could not recover when the network interface gets down. Thanks! --- Oscar Remírez de Ganuza Servicios Informáticos Universidad de Navarra Ed. de Derecho, Campus Universitario 31080 Pamplona (Navarra), Spain tfno: +34 948 425600 Ext. 3130 http://www.unav.es/SI smime.p7s Description: S/MIME Cryptographic Signature ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker