Re: [Pacemaker] Multi-level ACLs for the CIB
On 03/19/10 06:22, Lars Ellenberg wrote: On Wed, Mar 17, 2010 at 06:12:24PM +0800, Yan Gao wrote: After investigating, I found that Unix domain sockets provide methods to identify the user on the other side of a socket. That means we don't need PAM to do authentication for local access, and the clients doesn't need to prompt user to input and transfer username/password to the server. And cib daemon still can run as hacluster. I've improved the ipcsocket library of cluster-glue to record user's identity info for cib to use. The behavior of remote access to the cib is still like before. Attached the patch for cluster-glue and the updated patch for pacemaker. Looking forward to your review and comments. Thanks! diff -r 5e7284501da6 -r 699b8e950cdf include/clplumbing/ipc.h --- a/include/clplumbing/ipc.h Mon Mar 15 16:03:30 2010 +0100 +++ b/include/clplumbing/ipc.h Wed Mar 17 15:06:08 2010 +0800 @@ -132,6 +132,8 @@ int ch_status; /* identify the status of channel.*/ int refcount; /* reference count */ pid_t farside_pid;/* far side pid */ +uid_t farside_uid;/* far side uid */ +gid_t farside_gid;/* far side gid */ void* ch_private; /* channel private data. */ /* (may contain conn. info.) */ IPC_Ops*ops;/* IPC_Channel function table.*/ If you instead add the new members at the _end_ of the struct(s), it should be easier to maintain ABI compatibility. Right. I should have thought of this compatibility issue. Thanks, Yan -- Yan Gao y...@novell.com Software Engineer China Server Team, OPS Engineering, Novell, Inc. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] logd and corosync/pacemaker
-Ursprüngliche Nachricht- Von: Dejan Muhamedagic deja...@fastmail.fm Gesendet: 15.03.2010 11:01:03 An: pacemaker@oss.clusterlabs.org Betreff: Re: [Pacemaker] logd and corosync/pacemaker Hi, On Fri, Mar 12, 2010 at 05:24:53PM +0100, Andreas Mock wrote: Hi all, with heartbeat it was adviced to use logd for logging. a) Is still valid for a corosync/pacemaker combination? Yes. b) If yes, how is it enabled? Set use_logd to yes in the pacemaker service stanza in corosync.conf. Hi Dejan, I don't get it work with corosync. Probably some insight is missing. a) Can you give me an example of that stanza? b) Which services start to log to logd if logd is enabled? c) Does corosync also log to logd? d) If I enable logd, what does the paragraph logging { fileline: off to_syslog: no to_stderr: no to_logfile: yes syslog_facility: daemon logfile: /tmp/corosync.log debug: on timestamp: on logger_subsys { subsys: AMF debug: off } } in /etc/corosync/corosync.conf mean? Help needed. :-) Thank you in advance Andreas ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
[Pacemaker] DRBD Management Console 0.7.0
Hi, This is the next DRBD-MC beta release 0.7.0. DRBD-MC, that is also a Pacemaker GUI, is a Java application that helps to configure DRBD/Pacemaker/Corosync/Heartbeat clusters. It is compatible with Heartbeat 2.1.3 to the Pacemaker 1.0.8 with both available communication layers and DRBD 8. In this release resource defaults were added. This is one of the missing features, that normal people would use, I think. You'll find it in next to the global options if you cluster software supports it. The great new feature are the different operating modes and advanced/not advanced modes. Depending on the operating mode, the input fields and menus are shown, hidden, enabled or disabled. For example an administrator that does not configure the cluster, can quickly find what he needs, not to worry removing or changing something he shouldn't. You can start the DRBD-MC with maximum operating mode that is allowed for the user in the whole application and change to the lesser modes on the fly. Of course it can be easily circumvented so this is not a security feature, but... Disabling and hiding of widgets, menus and whole panels should fit nicely with the upcoming ACLs, as soon as they figure out how to do it. Till then, the new operating modes make the configuration and administration of a the cluster much easier, faster and almost enjoyable. There are currently following Operating Modes: Read-Only (cmd option --ro): read-only access is granted, you can view the cluster, add and remove clusters to and from your DRBD MC, but you cannot change anything on the cluster. This is somewhat equivalent to watching crm_mon, but way more informative. Additionally you can start VNC Viewer to work with Virtual Machines. Operator (--op): you can do the basic tasks like stop, start and migrate resources and put nodes to and out of standby, but also resolve DRBD split-brains for example. All configuration options are hidden. This is, what it seems, about the functionality that Hawk is going to have, but without operations on DRBD and VMs. Administrator (--admin): this level of access can create, configure, reconfigure and destroy, as well as operate the whole cluster, but many most of the options are hidden. This is a default operating mode. Administrator (--admin)/ Advanced: here are the options, that are seldom needed and/or I am not even sure what they do. Another but hidden operating mode is a God mode. This is useful, as the name suggest, only for development and for testing. http://oss.linbit.com/drbd-mc/img/drbd-mc-0.7.0.png You can get DRBD MC here: http://www.drbd.org/mc/management-console/ http://oss.linbit.com/drbd-mc/DMC-0.7.0.jar http://oss.linbit.com/drbd-mc/drbd-mc-0.7.0.tar.gz You can start it with help of Java Web-Start or you can download it and start it with java -Xmx512m -jar DMC-0.7.0.jar command. Make sure you use the Java from SUN. The openjdk seems to work fine by some time now, but it seems to run DRBD MC much slower than the original Java. Rasto Levrinc Here is the changelog: * Removing of DRBD resources was fixed. * VNC viewer menu in the cluster view was fixed. * stonith-timeout and priority stonith attributes were added. * stonith_ prefix for stonith devices in their ids is used. * When group is stopped, it is indicated in the cluster view. * master and slave target-roles for master slave resources were added. * All missing meta-attributes for groups and clones were added. * Advisory values from status, meta-data and validate-all operations are not used anymore. * Some global CRM parameters, that didn't have defaults, were fixed. * Different operating modes were implemented. * Terminal frame is started as collapsed now. * Parsing of operation defaults was added. * Resource defaults were added. * Metal look-and-feel is forced so that it works on Macs. * DRBD status after start is not delayed after start-up. * GUI helper perl script got a version to its file name, so that different versions of DRBD-MC can be used at the same time on one cluster. * Smoother and faster start-up, when there are many resources. -- : Dipl-Ing Rastislav Levrinc : DRBD-MC http://www.drbd.org/mc/management-console/ : DRBD/HA support and consulting http://www.linbit.com/ DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
[Pacemaker] Packaging error in cluster-glue
Hi all, I don't know who feels responsible for that, but I found the following: My external stonith script ibmrsa-telnet was enhanced in several ways which is great. I take the chance to thank all contributors. One of the enhancements was a change in logging. From the beginning there was a way to log the operation of the stonith script as there was no common service for that. Now a subprocess call to 'ha_log.sh' is done which is a part of cluster-glue (path /usr/share/cluster-glue/ha_log.sh in clusterglue 1.0.3). BUT: The script is called without any path, only with ha_log.sh which is not in the path by default or postinstallatin script. So, please use a full path to ha_log.sh in ibmrsa-telnet. As I don't know how these relocatable things are handled for the build environment and where and how to search for the sources in the mecurial repositories, I would be thankful seeing someone changing it. Best regards Andreas Mock ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] WARNING: drbd0: default-action-timeout
Hi, On Thu, Mar 18, 2010 at 10:52:24PM +0100, Andreas Mock wrote: -Ursprüngliche Nachricht- Von: Michael Schwartzkopff mi...@multinet.de Gesendet: 18.03.2010 21:32:22 An: pacemaker@oss.clusterlabs.org Betreff: Re: [Pacemaker] WARNING: drbd0: default-action-timeout Hi, new introduced in 1.0.8. If your resources work you can safely ignore this. Hi Michael, is this a new feature which can be used by other RA too? Is the RA giving the recommendation which leads to the warning? Yes. The feature has always been there, it's just that the timeouts were checked only if set explicitely, but not against the default-action-timeout. It could get noisy I'm afraid, but you should follow the advice and fix the timeouts. Or increase the default-action-timeout. Of course, depends on your resources. What is advised in the metadata of resource agents should be the minumum timeouts for resources of that type. If you find these warnings showing up too often, set the check-frequency option to on-verify: crm options check-frequency on-verify It is also possible to set this option to never, but I'd strongly advise against it for production clusters. Thanks, Dejan Or is it directly programmed into pacemaker (or a part of it)? More informations welcome. Best regards Andreas ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Problem : Sometimes failed in the start of the guest(on KVM).
Hi Hideo-san, On Fri, Mar 19, 2010 at 10:46:12AM +0900, renayama19661...@ybb.ne.jp wrote: Hi, I use VirtualDomain-RA and, on KVM, constitute a cluster. However, a guest sometimes fails in start. Mar 16 15:16:52 x3650e lrmd: [13457]: info: RA output: (guest-kvm1:start:stderr) error: Failed to start domain kvm1 error: internal error unable to start guest: inet_listen: bind(ipv4,127.0.0.1,5900): Address already in use inet_listen: FAILED Mar 16 15:16:52 x3650e lrmd: [13457]: info: RA output: (guest-kvm3:start:stdout) Domain kvm3 started Mar 16 15:16:52 x3650e crmd: [13460]: info: abort_transition_graph: te_update_diff:146 - Triggered transition abort (complete=0, tag=transient_attributes, id=x3650f, magic=NA, cib=0.102.76) : Transient attribute: update Mar 16 15:16:52 x3650e VirtualDomain[13781]: ERROR: Failed to start virtual domain kvm1. Is this a problem related to libvirt? We are the next environment. * RHEL5.4-64(kvm) * libvirt-0.6.3-20.el5 * libvirt-python-0.6.3-20.el5 * libvirt-0.6.3-20.el5 * corosync-1.2.0.zip * Cluster-Resource-Agents-bb7dc7b7f6e4.tar.gz * Pacemaker-1-0-efdc0d8143dd.tar.gz * Pacemaker-Python-GUI-a05fd62b2e13.tar.gz * Reusable-Cluster-Components-65900eaaf453.tar.gz Do know the solution of the problem? Mar 16 15:16:52 x3650e lrmd: [13457]: info: RA output: (guest-kvm1:start:stderr) error: Failed to start domain kvm1 error: internal error unable to start guest: inet_listen: bind(ipv4,127.0.0.1,5900): IIRC, that port has to do with vnc and something else (another VNC server?) has already been started on that port. Thanks, Dejan Best Regards, Hideo Yamauchi. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] logd and corosync/pacemaker
Hi, On Fri, Mar 19, 2010 at 10:34:27AM +0100, Andreas Mock wrote: -Ursprüngliche Nachricht- Von: Dejan Muhamedagic deja...@fastmail.fm Gesendet: 15.03.2010 11:01:03 An: pacemaker@oss.clusterlabs.org Betreff: Re: [Pacemaker] logd and corosync/pacemaker Hi, On Fri, Mar 12, 2010 at 05:24:53PM +0100, Andreas Mock wrote: Hi all, with heartbeat it was adviced to use logd for logging. a) Is still valid for a corosync/pacemaker combination? Yes. b) If yes, how is it enabled? Set use_logd to yes in the pacemaker service stanza in corosync.conf. Hi Dejan, I don't get it work with corosync. Probably some insight is missing. a) Can you give me an example of that stanza? service { #Default to start mgmtd with pacemaker use_mgmtd: yes #Use logd for pacemaker use_logd: yes #Version ver:0 #The name of the service name: pacemaker } b) Which services start to log to logd if logd is enabled? All pacemaker subsystems. c) Does corosync also log to logd? No. d) If I enable logd, what does the paragraph logging { fileline: off to_syslog: no to_stderr: no to_logfile: yes syslog_facility: daemon logfile: /tmp/corosync.log debug: on timestamp: on logger_subsys { subsys: AMF debug: off } } in /etc/corosync/corosync.conf mean? That's for corosync. Thanks, Dejan Help needed. :-) Thank you in advance Andreas ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Problem : Sometimes failed in the start of the guest(on KVM).
Hi Dejan, IIRC, that port has to do with vnc and something else (another VNC server?) has already been started on that port. Thank you for comment. I examine it a little more. Best Regards, Hideo Yamauchi. --- Dejan Muhamedagic deja...@fastmail.fm wrote: Hi Hideo-san, On Fri, Mar 19, 2010 at 10:46:12AM +0900, renayama19661...@ybb.ne.jp wrote: Hi, I use VirtualDomain-RA and, on KVM, constitute a cluster. However, a guest sometimes fails in start. Mar 16 15:16:52 x3650e lrmd: [13457]: info: RA output: (guest-kvm1:start:stderr) error: Failed to start domain kvm1 error: internal error unable to start guest: inet_listen: bind(ipv4,127.0.0.1,5900): Address already in use inet_listen: FAILED Mar 16 15:16:52 x3650e lrmd: [13457]: info: RA output: (guest-kvm3:start:stdout) Domain kvm3 started Mar 16 15:16:52 x3650e crmd: [13460]: info: abort_transition_graph: te_update_diff:146 - Triggered transition abort (complete=0, tag=transient_attributes, id=x3650f, magic=NA, cib=0.102.76) : Transient attribute: update Mar 16 15:16:52 x3650e VirtualDomain[13781]: ERROR: Failed to start virtual domain kvm1. Is this a problem related to libvirt? We are the next environment. * RHEL5.4-64(kvm) * libvirt-0.6.3-20.el5 * libvirt-python-0.6.3-20.el5 * libvirt-0.6.3-20.el5 * corosync-1.2.0.zip * Cluster-Resource-Agents-bb7dc7b7f6e4.tar.gz * Pacemaker-1-0-efdc0d8143dd.tar.gz * Pacemaker-Python-GUI-a05fd62b2e13.tar.gz * Reusable-Cluster-Components-65900eaaf453.tar.gz Do know the solution of the problem? Mar 16 15:16:52 x3650e lrmd: [13457]: info: RA output: (guest-kvm1:start:stderr) error: Failed to start domain kvm1 error: internal error unable to start guest: inet_listen: bind(ipv4,127.0.0.1,5900): IIRC, that port has to do with vnc and something else (another VNC server?) has already been started on that port. Thanks, Dejan Best Regards, Hideo Yamauchi. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] WARNING: drbd0: default-action-timeout
Hi all, Thanks for the responses. I actaully had it figured out a few minutes after I posted - doh! I just added: 'op start timeout=XXs op stop timeout=XXs' for the 3 primitives that were giving this warning - obviously using the suggested defaults for each. In the case of DRBD I simply had to changeover to using: primitive drbd0 ocf:linbit:drbd params drbd_resource=drbd0 op \ monitor interval=15s op start timeout=240s op stop timeout=100s I had to make similar changes to the IPPADR2 and Filesystem primitives as well. This isn't something I wanted to ignore though - every command you issue in the crm would give you that WARNING again - so with 3 resources complaining you get a LOT of warnings anytime you issue a verify, commit, ect. With a warn for the start timeout and one for the stop, that's 6 warning flags! As a cluster admin, I don't like to see WARNING! :) Heads up for all though: there seem to be a lot of minor little syntax changes, the'crm resource migrate rsc node' now needs an extra argument of duration is anothert I ran into. tells you to reference: http://en.wikipedia.org/wiki/ISO_8601#Durations for proper syntax. I just used 1year duration as I planned on unmigrating again almost immediately. 'crm resource migrate rsc node -PY1 Kenneth M DeChick Linux Systems Administrator Community Computer Service, Inc. (315)-255-1751 ext154 http://www.medent.com k...@medent.com Registered Linux User #497318 -- -- -- -- -- -- -- -- -- -- -- You canna change the laws of physics, Captain; I've got to have thirtyminutes! . -- Original Message --- From: Glauber Cabral glauber...@gmail.com To: pacema...@clusterlabs.org, k...@medent.com Sent: Thu, 18 Mar 2010 17:25:22 -0300 Subject: Re: [Pacemaker] WARNING: drbd0: default-action-timeout Hi Kenneth I'm new to pacemaker, but I guess that your problem is that the timeout for start and stop action are set by default to 20s somewhere (it seems it's not defined in your file) and pacemaker is telling you there timeouts are shorter then the recomended ones. So, my suggestion is to define these timeouts by yourself, assuming the suggested values are OK. To do so, you should type the command below in shell to edit your configurations: # crm configure edit And change the DRDB primitive to this: primitive drbd0 ocf:linbit:drbd params drbd_resource=drbd0 \ op monitor interval=15s \ op start timeout=240s \ op stop timeout=100s I hope this can help you =) []s Glauber On Thu, Mar 18, 2010 at 4:16 PM, Ken Dechick k...@medent.com wrote: Hi all, Just updated my test cluster to latest 1.0.8 pacemaker (from 1.0.6) and 3.0.2-2 heartbeat (from 3.0.1-1). Was going through my usual configuration steps, when I ran into a warning I have never seen before in setup. I start my bare cluster and co cmd-line configuring within the crm shell from there. My first primitive device is my DRBD resource and the command I use: primitive drbd0 ocf:linbit:drbd params drbd_resource=drbd0 op monitor \ interval=15s Today I am suddenly getting a new warning when I use this WARNING: drbd0: default-action-timeout 20s for start is smaller than the advised 240 WARNING: drbd0: default-action-timeout 20s for stop is smaller than the advised 100 But I don't know the syntax to correct this. Searching around in the lists I don't see anything - perhaps this is something new with pacemaker 1.0.8? Can anyone shed some light? -Thanks Kenneth M DeChick Linux Systems Administrator Community Computer Service, Inc. (315)-255-1751 ext154 http://www.medent.com k...@medent.com Registered Linux User #497318 -- -- -- -- -- -- -- -- -- -- -- You canna change the laws of physics, Captain; I've got to have thirtyminutes! . This message has been scanned for viruses and dangerous content by MailScanner, SpamAssassin ClamAV. This message and any attachments may contain information that is protected by law as privileged and confidential, and is transmitted for the sole use of the intended recipient(s). If you are not the intended recipient, you are hereby notified that any use, dissemination, copying or retention of this e-mail or the information contained herein is strictly prohibited. If you received this e-mail in error, please immediately notify the sender by e-mail, and permanently delete this e-mail. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker --- End of Original Message --- This message has been scanned for viruses and dangerous content by MailScanner, SpamAssassin ClamAV. This message and any attachments may contain information that is protected by law as privileged and confidential, and is transmitted for the sole use of the intended recipient(s). If you are not the intended recipient, you are hereby notified
Re: [Pacemaker] [PATCH] Medium: build: require Net-SNMP 5.3 or later
On 03/18/2010 10:02 AM, Andrew Beekhof wrote: On Wed, Mar 17, 2010 at 11:02 AM, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi, On Wed, Mar 17, 2010 at 09:17:38AM +0100, Florian Haas wrote: Andrew, now that Pacemaker has been on a bi-monthly release schedule for a while, is there any chance you could consider publishing RCs before the actual releases, at least for the stable-1.0 branch? Good idea. That would give others a chance to give the RC a try and report any problems before the final release. I use the following for the pacemaker rpms I'm using at any given time: http://www.clusterlabs.org/rpm/testing/ They're for 64-bit F-12 but you can rebuild them for whatever platform you like I've no intention of building them for other platforms, I'd spend my entire life building packages instead of getting any work done. Just be sure to refresh the metadata regularly, burning through release numbers and disk space isn't the goal here. Who said you should build RC _packages_? Tag an RC, upload a tarball, announce on mailing list, done. How is that extra work? No wait, Pacemaker builds directly from a Mercurial tarball. So scratch the upload part. Florian signature.asc Description: OpenPGP digital signature ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] node states
On 03/17/2010 09:30 PM, Andrew Beekhof wrote: On Wed, Mar 17, 2010 at 7:53 PM, Matthew Palmer mpal...@hezmatt.org wrote: On Wed, Mar 17, 2010 at 07:16:16AM -0500, Schaefer, Diane E wrote: We were wondering what the node state of UNCLEAN, with the three variations of online, offline and pending returned in crm_mon mean. We had the heartbeat service off on one of our nodes and the other node reported UNCLEAN (online). We seem to get it when the nodes are not communicating. Thanks for any clarification. Unclean (online) means that the STONITH resource for that node had some failures, and so the cluster isn't confident that when it comes time to shoot that node (if required), it'll actually work. You'll also see it when any resource fails to stop _and_ stonith isn't enabled. Never seen that. AFAICS when you disable STONITH and a resource fails on stop, then the resource goes into the Unmanaged state, but the associated node does not become Unclean. At least as far as crm_mon says. Florian signature.asc Description: OpenPGP digital signature ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] [PATCH] Medium: build: require Net-SNMP 5.3 or later
On Fri, Mar 19, 2010 at 8:54 AM, Florian Haas florian.h...@linbit.com wrote: On 03/18/2010 10:02 AM, Andrew Beekhof wrote: On Wed, Mar 17, 2010 at 11:02 AM, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi, On Wed, Mar 17, 2010 at 09:17:38AM +0100, Florian Haas wrote: Andrew, now that Pacemaker has been on a bi-monthly release schedule for a while, is there any chance you could consider publishing RCs before the actual releases, at least for the stable-1.0 branch? Good idea. That would give others a chance to give the RC a try and report any problems before the final release. I use the following for the pacemaker rpms I'm using at any given time: http://www.clusterlabs.org/rpm/testing/ They're for 64-bit F-12 but you can rebuild them for whatever platform you like I've no intention of building them for other platforms, I'd spend my entire life building packages instead of getting any work done. Just be sure to refresh the metadata regularly, burning through release numbers and disk space isn't the goal here. Who said you should build RC _packages_? Tag an RC, upload a tarball, announce on mailing list, done. How is that extra work? No wait, Pacemaker builds directly from a Mercurial tarball. So scratch the upload part. What does the tag achieve apart from ensuring people waste their time testing versions that don't have any fixes since it was created? ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] WARNING: drbd0: default-action-timeout
Hi, On Fri, Mar 19, 2010 at 09:24:16AM -0400, Ken Dechick wrote: Hi all, Thanks for the responses. I actaully had it figured out a few minutes after I posted - doh! I just added: 'op start timeout=XXs op stop timeout=XXs' for the 3 primitives that were giving this warning - obviously using the suggested defaults for each. In the case of DRBD I simply had to changeover to using: primitive drbd0 ocf:linbit:drbd params drbd_resource=drbd0 op \ monitor interval=15s op start timeout=240s op stop timeout=100s I had to make similar changes to the IPPADR2 and Filesystem primitives as well. This isn't something I wanted to ignore though - every command you issue in the crm would give you that WARNING again - so with 3 resources complaining you get a LOT of warnings anytime you issue a verify, commit, ect. With a warn for the start timeout and one for the stop, that's 6 warning flags! As a cluster admin, I don't like to see WARNING! :) Well, that's a good attitude. This information is important. Though they may be annoying, they should really be addressed in some way. It could also be that the timeouts advertised by the RA are wrong. If you think so, then please post a question. For instance, I can see now that those for IPaddr/IPaddr2 are really excessive. Heads up for all though: there seem to be a lot of minor little syntax changes, the'crm resource migrate rsc node' now needs an extra argument of duration is anothert I ran into. tells you to reference: That's not exactly true. If it were, that would've been a regression and in general we don't like those. It is true that there is an extra parameter, but it's optional. If the above form doesn't work (it really does here), then please open a bugzilla. Cheers, Dejan http://en.wikipedia.org/wiki/ISO_8601#Durations for proper syntax. I just used 1year duration as I planned on unmigrating again almost immediately. 'crm resource migrate rsc node -PY1 Kenneth M DeChick Linux Systems Administrator Community Computer Service, Inc. (315)-255-1751 ext154 http://www.medent.com k...@medent.com Registered Linux User #497318 -- -- -- -- -- -- -- -- -- -- -- You canna change the laws of physics, Captain; I've got to have thirtyminutes! . -- Original Message --- From: Glauber Cabral glauber...@gmail.com To: pacema...@clusterlabs.org, k...@medent.com Sent: Thu, 18 Mar 2010 17:25:22 -0300 Subject: Re: [Pacemaker] WARNING: drbd0: default-action-timeout Hi Kenneth I'm new to pacemaker, but I guess that your problem is that the timeout for start and stop action are set by default to 20s somewhere (it seems it's not defined in your file) and pacemaker is telling you there timeouts are shorter then the recomended ones. So, my suggestion is to define these timeouts by yourself, assuming the suggested values are OK. To do so, you should type the command below in shell to edit your configurations: # crm configure edit And change the DRDB primitive to this: primitive drbd0 ocf:linbit:drbd params drbd_resource=drbd0 \ op monitor interval=15s \ op start timeout=240s \ op stop timeout=100s I hope this can help you =) []s Glauber On Thu, Mar 18, 2010 at 4:16 PM, Ken Dechick k...@medent.com wrote: Hi all, Just updated my test cluster to latest 1.0.8 pacemaker (from 1.0.6) and 3.0.2-2 heartbeat (from 3.0.1-1). Was going through my usual configuration steps, when I ran into a warning I have never seen before in setup. I start my bare cluster and co cmd-line configuring within the crm shell from there. My first primitive device is my DRBD resource and the command I use: primitive drbd0 ocf:linbit:drbd params drbd_resource=drbd0 op monitor \ interval=15s Today I am suddenly getting a new warning when I use this WARNING: drbd0: default-action-timeout 20s for start is smaller than the advised 240 WARNING: drbd0: default-action-timeout 20s for stop is smaller than the advised 100 But I don't know the syntax to correct this. Searching around in the lists I don't see anything - perhaps this is something new with pacemaker 1.0.8? Can anyone shed some light? -Thanks Kenneth M DeChick Linux Systems Administrator Community Computer Service, Inc. (315)-255-1751 ext154 http://www.medent.com k...@medent.com Registered Linux User #497318 -- -- -- -- -- -- -- -- -- -- -- You canna change the laws of physics, Captain; I've got to have thirtyminutes! . This message has been scanned for viruses and dangerous content by MailScanner, SpamAssassin ClamAV. This message and any attachments may contain information that is protected by law as privileged and confidential, and is transmitted for the sole use of the intended recipient(s). If you are not the intended recipient, you are hereby notified that any
Re: [Pacemaker] [PATCH] Medium: build: require Net-SNMP 5.3 or later
On 03/19/2010 03:39 PM, Andrew Beekhof wrote: Who said you should build RC _packages_? Tag an RC, upload a tarball, announce on mailing list, done. How is that extra work? No wait, Pacemaker builds directly from a Mercurial tarball. So scratch the upload part. What does the tag achieve apart from ensuring people waste their time testing versions that don't have any fixes since it was created? Remind contributors that a release is imminent? Florian signature.asc Description: OpenPGP digital signature ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
[Pacemaker] Building an active/passive dhcp server
Hello, I'm trying to make a active/passive dhcp server. Currently, it works with the following setup : * 2 debian servers with pacemaker : node1 with physical ip1 and virtual ip vip1 (managed with pacemaker) node2 with physical ip2. * 1 lsb dhcp3-server resource that is on node1 and migrates ok on node2, * a rsync cron gets the dhcp lease file from node1 to node2 in order not to start an empty dhcp lease file, * the server should be delivering dhcp lease with vip1 because it is on a vlan and core router use cisco ip helper to send dhcp requests. The problem is that when node1 come online again, there's a difference in the dhcp lease file. I think that using rsync to synchronize the lease file is not the best solution and that a clustered file system is the best solution. What are your opinions about such a setup ? Are there some best practices ? Thanks for your help and informations about this. -- Emmanuel Lesouef ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Building an active/passive dhcp server
On Fri, Mar 19, 2010 at 10:47:59PM +0100, Emmanuel Lesouef wrote: I'm trying to make a active/passive dhcp server. [...] The problem is that when node1 come online again, there's a difference in the dhcp lease file. I think that using rsync to synchronize the lease file is not the best solution and that a clustered file system is the best solution. Yes, rsyncing your leases file around isn't going to be a win. However, a clustered filesystem is a really bad idea, as the complexity is far more than you need. Instead, a small DRBD (http://www.drbd.org/) volume with a regular filesystem such as ext3 will work Just Fine And Dandy. - Matt ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker