Re: [Linux-ha-dev] PATCH: race in iSCSILogicalUnit
On 10/08/2012 01:58 PM, Philipp Marek wrote: > Hi Florian, > > Dejan told me that you're the maintainer for the iSCSI pieces, so I'm > sending you this patch. Sorry about the late reply; lately I've been watching the GitHub pull requests more religiously than the list. > Please apply, thank you very much! As the change you are proposing is LIO specific, please rename the parameter to lio_iblock, add code that will log warnings if this parameter is used in a configuration that uses one of the other supported targets, and send a pull request. Thanks, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] RA developer's guide 1.0.3
Hi everyone, Tal Yalon has pointed out an error in the RA developer's guide about the "required" and "unique" attributes in RA metadata (they belong on elements, not as the guide erroneously stated). I've spun and pushed a minor update. Enjoy release 1.0.3. http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] sbd spinoff from cluster-glue
Dejan, Lars, is it confirmed from your end that sbd is moving out of cluster-glue? If so, it would be nice if we could get an cluster-glue release with sbd removed, and a release of standalone sbd, so packagers can fix the relevant distro packages up properly. Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH 0 of 2] Autotoolize build
On Tue, May 29, 2012 at 6:38 PM, Lars Marowsky-Bree wrote: > On 2012-05-29T18:34:15, Florian Haas wrote: > >> Yeah, it seems you just broke the build by including cluster/stack.h >> and not bothering to add an AC_CHECK_HEADERS to configure.ac. Where >> does that come from, is that new to Pacemaker? > > Uh? It builds here on the 1.1.7 pacemaker version. > > The integration with the cluster stack is rather specific to whatever > pacemaker/corosync version + configuration you build against. > Unfortunately. Well that's what #ifdef HAVE_CLUSTER_STACK_H and friends are good for, no? Florian ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH 0 of 2] Autotoolize build
On Tue, May 29, 2012 at 6:26 PM, Lars Marowsky-Bree wrote: > On 2012-05-29T17:56:59, Florian Haas wrote: > >> In case you're wondering why I didn't use PKG_CHECK_MODULES for the PE >> libraries: their pkg-config file is currently broken; Andrew has a >> pull request for Pacemaker for that. > > I was wondering more about how to build this against older codebases, > but then decided not to bother ;-) Yeah, it seems you just broke the build by including cluster/stack.h and not bothering to add an AC_CHECK_HEADERS to configure.ac. Where does that come from, is that new to Pacemaker? > The code seems to be working quite well. A README and a manpage would > probably be good ideas ... Well autofoo already gave you an INSTALL file, and you can use help2man for the man page generation (look at booth for the Makefile.am and configure.ac hack). For the README, copy your wiki page. -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH 0 of 2] Autotoolize build
On Tue, May 29, 2012 at 4:32 PM, Lars Marowsky-Bree wrote: > On 2012-05-29T14:31:20, Florian Haas wrote: > >> Forgot to mention this in the original cover message, for those who >> haven't been following the discussion: this is for sbd which is just >> spinning off from cluster-glue. > > Thanks, I've merged them both! In case you're wondering why I didn't use PKG_CHECK_MODULES for the PE libraries: their pkg-config file is currently broken; Andrew has a pull request for Pacemaker for that. Cheers, Florian ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH 0 of 2] Autotoolize build
On Tue, May 29, 2012 at 2:27 PM, Florian Haas wrote: > Lars, > > I did this as an exercise of sorts to understand how this compiles and > what its dependencies are. Considering the code base is quite small > it may seem like a pointless stunt to jump through all the autofoo > hoops, but it makes life that much easier for distro packagers. Forgot to mention this in the original cover message, for those who haven't been following the discussion: this is for sbd which is just spinning off from cluster-glue. Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [PATCH 1 of 2] build: autotoolize
# HG changeset patch # User Florian Haas # Date 1338230941 -7200 # Branch autotools # Node ID 9888c2e4353b08599e6977e5e61dd6d34ce6151e # Parent c4de704b6cea21c69b3c767d1c47bed727f94d82 build: autotoolize diff -r c4de704b6cea -r 9888c2e4353b COPYING --- /dev/null Thu Jan 01 00:00:00 1970 + +++ b/COPYING Mon May 28 20:49:01 2012 +0200 @@ -0,0 +1,339 @@ +GNU GENERAL PUBLIC LICENSE + Version 2, June 1991 + + Copyright (C) 1989, 1991 Free Software Foundation, Inc., + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + +Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +License is intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. This +General Public License applies to most of the Free Software +Foundation's software and to any other program whose authors commit to +using it. (Some other Free Software Foundation software is covered by +the GNU Lesser General Public License instead.) You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +this service if you wish), that you receive source code or can get it +if you want it, that you can change the software or use pieces of it +in new free programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must show them these terms so they know their +rights. + + We protect your rights with two steps: (1) copyright the software, and +(2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + Finally, any free program is threatened constantly by software +patents. We wish to avoid the danger that redistributors of a free +program will individually obtain patent licenses, in effect making the +program proprietary. To prevent this, we have made it clear that any +patent must be licensed for everyone's free use or not licensed at all. + + The precise terms and conditions for copying, distribution and +modification follow. + +GNU GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + + 1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physi
[Linux-ha-dev] [PATCH 0 of 2] Autotoolize build
Lars, I did this as an exercise of sorts to understand how this compiles and what its dependencies are. Considering the code base is quite small it may seem like a pointless stunt to jump through all the autofoo hoops, but it makes life that much easier for distro packagers. Feel free to apply this as you see fit. Cheers, Florian ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [rfc] SBD with Pacemaker/Quorum integration
On Fri, May 25, 2012 at 9:56 PM, Lars Marowsky-Bree wrote: > Should be packageable on every platform, though I admit that I've not > tried building the pacemaker module against anything but the > corosync+pacemaker+openais stuff we ship on SLE HA 11 so far. Are you expecting this to build without "-I/usr/include/libxml2"? It didn't for me, before I added that. Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [rfc] SBD with Pacemaker/Quorum integration
On Fri, May 25, 2012 at 5:41 PM, Lars Marowsky-Bree wrote: > On 2012-05-25T17:31:52, Florian Haas wrote: > >> > That aside, what do you think of the idea/approach? >> Um, right now I have no opinion. Your commit messages are pretty >> terse, and there's no README in the repo. Mind adding one? > > Good point. I wasn't aware the commit messages were terse ;-) > > To sketch this out: > > Basically though SBD continues as it always did. > > If you specify "-P" to the daemon start-up (usually via > /etc/sysconfig/sbd SBD_OPTS), the following will happen: > > sbd will start (in addition to the worker processes that monitor the > disks) a process that signs in with pacemaker (and corosync). This > process monitors that the partition the local node is part of is > quorate, and that the local node (according to the CIB as run through > pengine) is "healthy". > > If so, the master thread will not self-fence even if the majority of > devices is currently unavailable. > > That's it, nothing more. Does that help? It does. One naive question: what's the rationale of tying in with Pacemaker's view of things? Couldn't you just consume the quorum and membership information from Corosync alone? > It became needed because customers had scenarios with just one device > (which experienced intermittent problems), where MPIO acted up (I've > seen IO stuck for minutes), or even three devices where failures were > correlated. Then, SBD would self-fence, and the customer be unhappy. > > > (I have opinions on particularly the last failure mode. This seems to > arise specifically when customers have build setups with two HBAs, two > SANs, two storages, but then cross-linked the SANs, connected the HBAs > to each, and the storages too. That seems to frequently lead to > hiccups where the *entire* fabric is affected. I'm thinking this > cross-linking is a case of sham redundancy; it *looks* as if makes > things more redundant, but in reality reduces it since faults are no > longer independent. Alas, they've not wanted to change that.) Henceforth, I'm going to dangle this thread in front of everyone who believes their SAN can never fail. Thanks. :) Are there any SUSEisms in SBD or would you expect it to be packageable on any platform? Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [rfc] SBD with Pacemaker/Quorum integration
On Thu, May 24, 2012 at 3:10 PM, Lars Marowsky-Bree wrote: > On 2012-05-24T14:34:59, Florian Haas wrote: > >> > To give you a glance of the extended sbd code, you can check out >> > http://hg.linux-ha.org/sbd - the new Pacemaker integration is activated >> > using the "-P" option in /etc/sysconfig/sbd, otherwise sbd remains a >> > drop-in replacement for the previous versions. >> Just as a suggestion: since you're already taking this out of glue, >> would you mind also moving the repo to GitHub? It's just orders of >> magnitude more straightforward to review and comment on code that way. > > I'll probably do that, but since I stripped it out of glue to start > with, sticking with hg was easier for the time being. > > But yes, I am contemplating to get over my git aversion ;-) > > That aside, what do you think of the idea/approach? Um, right now I have no opinion. Your commit messages are pretty terse, and there's no README in the repo. Mind adding one? Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [rfc] SBD with Pacemaker/Quorum integration
On Thu, May 24, 2012 at 11:10 AM, Lars Marowsky-Bree wrote: > Hi all, > > I had to repeatedly deal with customer/partner scenarios where the SAN > was unreliable, and outages were correlated across fabrics. The desire > was to avoid the self-fence in such cases if the cluster is quorate and > the node is not unhealthy. > > This required SBD to link against pacemaker's CIB and PE libraries, and > all that that implies. Which meant sbd had to move out of cluster-glue, > or else we'd face a build loop. > > To give you a glance of the extended sbd code, you can check out > http://hg.linux-ha.org/sbd - the new Pacemaker integration is activated > using the "-P" option in /etc/sysconfig/sbd, otherwise sbd remains a > drop-in replacement for the previous versions. Just as a suggestion: since you're already taking this out of glue, would you mind also moving the repo to GitHub? It's just orders of magnitude more straightforward to review and comment on code that way. Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] Bug in iSCSILogicalUnit
Hi Vadym, moving the discussion to the -dev list, which is the more appropriate forum for this. Please reply to -dev; more comments inline. On Sun, May 20, 2012 at 9:52 PM, Vadym Chepkov wrote: > Hi, > > > The monitor operation of iSCSILogicalUnit is not specific enough in the > regular expression and I got very "nice" fencing going because it was falsely > reporting "failed to stop" resource. > > I happened to add primitive lun-build10, while already having lun-build1. It would be bad if _that_ caused problems, but I'm unsure how that would be related to your patch. > For myself I have narrowed it down to the following fix, but probably a more > appropriate regex has to be applied to these and other commands serving the > same purpose for other iSCSI implementations. > > diff --git a/heartbeat/iSCSILogicalUnit b/heartbeat/iSCSILogicalUnit > index 25ee32e..2cee970 100755 > --- a/heartbeat/iSCSILogicalUnit > +++ b/heartbeat/iSCSILogicalUnit > @@ -328,7 +328,7 @@ iSCSILogicalUnit_monitor() { > tgt) > # Figure out and set the target ID > TID=`tgtadm --lld iscsi --op show --mode target \ > - | sed -ne "s/^Target \([[:digit:]]\+\): > ${OCF_RESKEY_target_iqn}/\1/p"` > + | sed -ne "s/^Target \([[:digit:]]\+\): > ${OCF_RESKEY_target_iqn}$/\1/p"` Adding the end-of-line anchor there does make good sense, but it would only fix the case of there being two _targets_ sharing part of their IQN, not two LUs with primitives similarly named. Do you have more than one target, where the full IQN of one target is a substring of the IQN of another? > if [ -z "$TID" ]; then > # Our target is not configured, thus we're not > # running. > @@ -337,7 +337,7 @@ iSCSILogicalUnit_monitor() { > # This only looks for the backing store, but does not test > # for the correct target ID and LUN. > tgtadm --lld iscsi --op show --mode target \ > - | grep -E -q "[[:space:]]+Backing store.*: > ${OCF_RESKEY_path}" && return $OCF_SUCCESS > + | grep -E -q "[[:space:]]+Backing store.*: > ${OCF_RESKEY_path}$" && return $OCF_SUCCESS > ;; > lio) Here the "$" looks OK too, but here it would apply to two backing devices with overlapping paths. I presume you named your LVs "lun-build1" and "lun-build10" also, and they're in the same VG? Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH v2] resource-agents: add Linux proxy arp resource agent
On Wed, Apr 4, 2012 at 1:52 AM, Christian Franke wrote: > Hello Florian, > > Your question is fully justified - I sincerely apologize for ignoring > that comprehensive documentation. > > I rewrote the patch trying to adhere to the requirements given in the > documentation. Wow, that is a response we infrequently get so promptly from new RA authors. I'm impressed. Thanks! And no-one's expecting you to apologize for just missing a piece of documentation. Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH] resource-agents: add GNU/Linux proxy arp resource agent
On Tue, Apr 3, 2012 at 1:07 PM, Christian Franke wrote: > This patch adds an OCF resource agent which maintains proxy arp entries > in a GNU/Linux arp table. > > This is especially useful when a high-availability routing setup is built > and it is required to perform proxy arp. Thanks for the contribution. May I ask whether you've taken a look at the OCF RA dev guide? Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] some iSCSITarget meta data issues
On 02/27/12 07:38, Rasto Levrinc wrote: > I am talking about long shortdescs. E.g. > > Specifies the iSCSI target implementation ("iet", > "tgt" or "lio"). > > is way too long and it abbreviates to something like "Specifies the > iSCSI..." in the GUI. You may not care about this, but many people do. > So I am not nitpicking or anything, this meta-data happens to be my > interface to the resource agents and it used to work quite well in the > past. So generally I think that short-description shouldn't encode that > they specify something, the name of the resource agent and possible > values. > > Implementation > > or > > iSCSI target implementation > > would be enough in my opinion and it's nothing shameful to have short > short-descriptions. So you're proposing either a shortdesc that's _identical_ to the parameter name -- doesn't do a fat lot of good -- or one that contains the parameter name plus the string "iSCSI target" which you've been complaining about upthread? Does not compute. Let me suggest that this discussion is getting us nowhere. The shortdescs stay as they are until someone comes up with a real improvement. Thanks for reporting the other issues, though. Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] some iSCSITarget meta data issues
On 02/26/12 15:31, Rasto Levrinc wrote: > I think, I'm never bikeshedding. :) It is a mirror issue but the "iSCSI > target" in every short description is redundant, Which RA are you looking at? iSCSITarget supports 8 different parameters; 4 of them have "iSCSI target" in their shortdesc. That's hardly "every short description". > since they all belong > to the iSCSI target RA, and because the long descriptions are cut in the Now what, are you talking about longdesc or shortdesc? > display, then they all look like "iSCSI target..." and you must mouse > over to actually see what they are. I am exaggerating a bit here. So I > propose as general style rule, don't include RA name to the short > descriptions. > >> >>> 3. defaults are computed in this way that they may be different in >>> different cluster nodes and may change after the cluster is configured, >>> which is not very useful in my opinion. >> >> That was my way of trying to provide a "reasonable" default across >> distros. The alternative would be that every distro packager would >> have to patch the RA to provide the proper default for their platform >> -- which would be tgt for RHEL/CentOS, iet for SLES 11 and then lio >> for SLES 11 SP2+ (I think), undefined for Debian. You get the picture. >> I think the existing way of figuring out the defaults is saner, if not >> perfect. Feel free to convince me otherwise, though. > > > You are right about that. The problem I am having is, that they are two > types of defaults, that you can't distinguish just by looking in the > meta-data. > > The first is that are used by RA, so you don't have to define 20 > parameters, only if you want use something other than the default. This > is the most common use. > > The second type is a suggested value, that is advertised as a default, > but unless it is stored like normal value in the cib, it is not used by > the RA. > > The third type is a combination from the two above, like iSCSITarget. > > I am solving it by keeping track of the RAs, what kind of defaults they > are using for now, but I'd preferred that there was some consistency in > it. Patches welcome. Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] some iSCSITarget meta data issues
On Sat, Feb 25, 2012 at 10:02 AM, Rasto Levrinc wrote: > Hi, > > There are couple of issues in iSCSITarget meta-data. > > 1. there is a extra space in name attribute in status action and it > causes all sort of problems: > > > what it does, is that crm shell doesn't understand "status " action only > "status", but cib doesn't accept "status". So I can't simply fix it in > the GUI. Fixed in the metadata: https://github.com/ClusterLabs/resource-agents/commit/ebb5e5d103066cb19b46427ef0f28937f4943dbe However, iirc RAs expose "status" operation only for age-old compatibility reasons, and Pacemaker only ever uses "monitor". Which is why, I guess, no-one has run into this problem before. Any reason for LCMC to use "status" at all? > And some not very critical... > > 2. short descriptions are not really short, I think it's not necessary > prepend every one of them with "iSCSI target" > > The worst offender > Manages an iSCSI target export > > could be changed to "Implementation", or "Daemon Implementation" to be > short and descriptive. Yeah, that one was just a copy & paste error. Fixed too, thanks. About the others, I can only surmise you're bikeshedding -- those look fine to me. However, I'll be happy to take a patch if you have better suggestions. > 3. defaults are computed in this way that they may be different in > different cluster nodes and may change after the cluster is configured, > which is not very useful in my opinion. That was my way of trying to provide a "reasonable" default across distros. The alternative would be that every distro packager would have to patch the RA to provide the proper default for their platform -- which would be tgt for RHEL/CentOS, iet for SLES 11 and then lio for SLES 11 SP2+ (I think), undefined for Debian. You get the picture. I think the existing way of figuring out the defaults is saner, if not perfect. Feel free to convince me otherwise, though. Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [RfC] [Patch] Filesystem
On Mon, Feb 20, 2012 at 9:40 PM, Lars Ellenberg wrote: > What do you say? +1 Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH] Medium: Use the resource timeout as an override to the default dbus timeout for upstart RA
On Mon, Feb 20, 2012 at 11:57 AM, Andrew Beekhof wrote: >> It does, but the exit status is always '0', which makes 'service' binary >> unusable for monitoring the status of the service without parsing the >> command output. > > 10 head > 20 desk > 30 add > 40 goto 10 I believe you went through that same loop months ago when you found out about this on IRC. Looks like premature cache invalidation of the result. Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Additional changes made via DHCPD review process
On Fri, Dec 9, 2011 at 6:30 AM, Dejan Muhamedagic wrote: > Hi, > > On Tue, Dec 06, 2011 at 01:39:04PM -0400, Chris Bowlby wrote: >> Hi All, >> >> Ok, I'll look into csync, and will concede the point on the RA syncing >> the out of chrooted configuration file. >> >> I still need to find a means to monitor the DHCP responses however, as >> that will just improve the reliability of the cluster itself, as well as >> the service. > > I'm really not sure how to do that. > > Didn't review the agent, but on a cursory look, perhaps you could > provide the default for chrooted_path (/var/lib/dhcp). > > BTW, did you think of adding an ocft test case? Please, cut the new guy some slack. :) Evidently this is Chris' first contributed RA, and he has been enormously responsive to our suggestions, and has drastically improved his agent from his first submission. I'm sure he'll get to ocft in due course. Cheers, Florian ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Additional changes made via DHCPD review process
On Tue, Dec 6, 2011 at 4:44 PM, Dejan Muhamedagic wrote: > Hi, > > On Tue, Dec 06, 2011 at 10:59:20AM -0400, Chris Bowlby wrote: >> Hi Everyone, >> >> I would like to thank Florian, Andreas and Dejan for making >> suggestions and pointing out some additional changed I should make. At >> this point the following additional changes have been made: >> >> - A test case in the validation function for ocf_is_probe has been >> reversed tp ! ocf_is_probe, and the "test"/"[ ]" wrappers removed to >> ensure the validation is not occuring if the partition is not mounted or >> under a probe. >> - An extraneous return code has been removed from the "else" clause of >> the probe test, to ensure the rest of the validation can finish. >> - The call to the DHCPD daemon itself during the start phase has been >> wrapped with the ocf_run helper function, to ensure that is somewhat >> standardized. >> >> The first two changes corrected the "Failed Action... Not installed" >> issue on the secondary node, as well as the fail-over itself. I've been >> able to fail over to secondary and primary nodes multiple times and the >> service follows the rest of the grouped services. >> >> There are a few things I'd like to add to the script, now that the main >> issues/code changes have been addressed, and they are as follows: >> >> - Add a means of copying /etc/dhcpd.conf from node1 to node2...nodeX >> from within the script. The logic behind this is as follows: > > I'd say that this is admin's responsibility. There are tools such > as csync2 which can deal with that. Doing it from the RA is > possible, but definitely very error prone and I'd be very > reluctant to do that. Note that we have many RAs which keep > additional configuration in a file and none if them tries to keep > the copies of that configuration in sync itself. Seconded. Whatever configuration doesn't live _in_ the CIB proper, is not Pacemaker's job to replicate. The admin gets to either sync files manually across the nodes (csync2 greatly simplifies this; no need to reinvent the wheel), or put the config files on storage that's available to all cluster nodes. Cheers, Florian ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] pgsql and streaming replcation
On Sun, Dec 4, 2011 at 11:11 PM, Serge Dubrouski wrote: > Florian, Dejan how would you like to merge a patch when we are ready? The > patch will be rather big one and AFAIK you have some policy on the amount of > changes for one patch. If it's a big addition of functionality, then a big patch is expected. However please make sure that you do one patch per functional change. Also, don't mix functional changes with "cleanup" work like fixing whitespace, correcting incorrectly advertised resource parameters, etc. It's acceptable to mix those in with the same pull request, but they should be in separate changesets so we can easily bisect any arising issues. Other than that, I guess Dejan will agree with me that your PostgreSQL expertise is way better than his and mine. So if you greenlight the feature addition functionally we're unlikely to second-guess you on that. Does this help? Cheers, Florian ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] Generic Python framework for OCF Resource Agents released
Hi Volker, welcome. First off I would like to say that what you present is pretty impressive and could turn out to be something extraordinarily helpful. I agree that being able to easily create robust resource agents in Python would be a huge plus. I also agree with the pain points you mentioned about resource agents in shell. I'm impressed with what's on bitbucket; it's fairly easy to look around and get a good grasp of what you're doing. In addition, I'm an admitted Python fanboy and have been wishing for a Python-based VirtualDomain agent (one of the RAs I wrote and co-maintain, based on libvirt), rather than the current shell/virsh based one, for quite a while. But as it turns out what I, putting myself into the shoes of a resource agent author, would expect from a resource agent seems to only partially agree with your expectations. :) Let me give you my idea of an ideal Python resource agent (all pseudocode, just hacked up, totally untested, errors and typos mine, don't run this at home): class MyAgent(ResourceAgent): VERSION = "0.1" # declare parameter types. Yeah, of course we can use something more elegant than a nested dictionary PARAMS = { "foobar": { "type" : INTEGER, "required" : True, "unique" : True, "default" : "blah" }} def start(self, timeout, **params): # Do stuff to start the RA ... def stop(self, timeout, **params): # Do stuff to stop the RA ... def migrate_to(self, timeout, target, **params): # Do stuff to migrate the RA to target ... def migrate_from(self, timeout, source, **params): # Do stuff to migrate the RA from source ... def notify_post_start(self, timeout, *nodes, **params): # Do stuff to handle a post notification for # resources started on nodes ... And now, I'd like that all the scaffolding is done by the abstract base class. Such as: * parse command line options (you got that right) * create resource agent metadata (that too, however I'd much prefer if rather than registering handlers you would just be able to introspect the public methods on the RA, plus the params attribute, and build it that way.) * create the usage message (idem) * translate all the OCF_RESKEY_* envars into simple method parameters that then get passed into the methods * insert defaults for parameter values, where they exist and haven't been overridden * same for the various OCF_RESKEY_CRM_meta* envars * handle the command line invocation * set up logging handlers in a sane way so the RA author can just use logging.info() and friends, and the log output ends up where the cluster admin decides I wouldn't want to muck around with registering handlers and parameters. I as a resource agent author would like to not have to worry about the innards at all, unlike with the shell RAs where there's no real way around that. Just give me a method signature I need to implement, and a few public attributes I need to fill, and then I want to be able to focus on function, not form. Just so that's clear: that is just my idea; I'm not saying your approach is in any way inferior. We're just getting this discussion started. Maybe I'm totally off my rocker (it happens. :) ). As this is a discussion that's really for the -dev list, I've added that list to the recipients and would encourage people to continue the discussion there. Cheers, Florian ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] new RA: varnish
On Wed, Nov 23, 2011 at 10:40 AM, Léon Keijser wrote: > Hi, > > I've created a new RA to manage Varnish instances. I've forked > resource-agents and added it here: > > https://github.com/lkeijser/resource-agents We haven't seen much review from others here on the list, but Léon has been enormously responsive on GitHub and has polished and improved this RA considerably. I have just merged this into the upstream repo, the merge commit is here: https://github.com/ClusterLabs/resource-agents/commit/1e70eea45251b5375ea3314b63491e940cb055cd If you would like to test this RA (and you're much encouraged to do so), it's available from here: https://raw.github.com/ClusterLabs/resource-agents/master/heartbeat/varnish Thanks Léon, and enjoy your vacation! Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH 0/2] LVM: Fix activation for clustered VGs
On Fri, Nov 25, 2011 at 6:38 PM, Florian Haas wrote: > Lars (both), Dejan, Nils, > > could you take a quick peek at whether the following (untested) patches > look like they're making sense? The more important one is obviously > the second one. > > Nils, could you apply those patches on the system where you ran into > the issue? If it's more convenient, you can also fetch the patched RA > from my Github repo: > > https://github.com/fghaas/resource-agents/blob/lvm/heartbeat/LVM > > Thanks! > > Cheers, > Florian > > [PATCH 1/2] Low: LVM: add local convenience variables in LVM_start > [PATCH 2/2] Medium: LVM: force dmevent monitoring for clones Merged, plus a couple of trivial janitor patches. Recent commit history is here: https://github.com/ClusterLabs/resource-agents/commits/master/heartbeat/LVM Cheers, Florian ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH 2/2] Medium: LVM: force dmevent monitoring for clones
On Sat, Nov 26, 2011 at 11:58 AM, Lars Marowsky-Bree wrote: > On 2011-11-25T18:38:06, Florian Haas wrote: > >> Starting a clustered volume with monitoring disabled is not allowed: >> >> http://www.redhat.com/archives/lvm-devel/2010-March/msg00289.html >> >> Which would be fine, as activation/monitoring = 1 ships as the default >> in lvm.conf. However, at least some versions of LVM seem to ignore this, >> throwing an error on vgchange unless "--monitor y" is explicitly set >> on the command line: >> >> https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/833368 >> >> Thus, for cloned instances, always invoke vgchange with "--monitor y". >> >> Thanks to Nils Meyer for pointing out this issue. >> --- >> heartbeat/LVM | 6 ++ >> 1 files changed, 6 insertions(+), 0 deletions(-) > > Seems to make sense. of course, an alternative would be to add a > "Conflicts: lvm2 < x.y.z" to the package on the respective versions to > make sure it's only installed with a fixed lvm2 package ...? Surely you're joking. resource-agents does not enforce any packaging dependencies for the stuff it's capable of managing, so why throw in a random conflict here? Of course, we probably wouldn't have this version issue if the LVM RA was packaged with LVM, but someone shot down that suggestion. Who was that, I wonder? I'm thinking, I'm thinking... nevermind, it'll come to me. Cheers, Florian ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [PATCH 2/2] Medium: LVM: force dmevent monitoring for clones
Starting a clustered volume with monitoring disabled is not allowed: http://www.redhat.com/archives/lvm-devel/2010-March/msg00289.html Which would be fine, as activation/monitoring = 1 ships as the default in lvm.conf. However, at least some versions of LVM seem to ignore this, throwing an error on vgchange unless "--monitor y" is explicitly set on the command line: https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/833368 Thus, for cloned instances, always invoke vgchange with "--monitor y". Thanks to Nils Meyer for pointing out this issue. --- heartbeat/LVM |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/heartbeat/LVM b/heartbeat/LVM index d8ad3ca..05eefe7 100755 --- a/heartbeat/LVM +++ b/heartbeat/LVM @@ -224,6 +224,12 @@ LVM_start() { vgchange_options="$vgchange_options --partial" fi + # for clones (clustered volume groups), we'll also have to force + # monitoring, even if disabled in lvm.conf. + if ocf_is_clone; then + vgchange_options="$vgchange_options --monitor y" + fi + ocf_run vgchange $vgchange_options $1 || return $OCF_ERR_GENERIC if LVM_status $1; then -- 1.7.5.4 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [PATCH 0/2] LVM: Fix activation for clustered VGs
Lars (both), Dejan, Nils, could you take a quick peek at whether the following (untested) patches look like they're making sense? The more important one is obviously the second one. Nils, could you apply those patches on the system where you ran into the issue? If it's more convenient, you can also fetch the patched RA from my Github repo: https://github.com/fghaas/resource-agents/blob/lvm/heartbeat/LVM Thanks! Cheers, Florian [PATCH 1/2] Low: LVM: add local convenience variables in LVM_start [PATCH 2/2] Medium: LVM: force dmevent monitoring for clones ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [PATCH 1/2] Low: LVM: add local convenience variables in LVM_start
--- heartbeat/LVM | 11 +++ 1 files changed, 7 insertions(+), 4 deletions(-) diff --git a/heartbeat/LVM b/heartbeat/LVM index 683d4d5..d8ad3ca 100755 --- a/heartbeat/LVM +++ b/heartbeat/LVM @@ -201,6 +201,8 @@ LVM_monitor() { # Enable LVM volume # LVM_start() { + local vgchange_options + local active_mode # TODO: This MUST run vgimport as well @@ -215,13 +217,14 @@ LVM_start() { active_mode="ly" if ocf_is_true "$OCF_RESKEY_exclusive" ; then active_mode="ey" - fi - partial_active="" + fi + vgchange_options="-a $active_mode" + if ocf_is_true "$OCF_RESKEY_partial_activation" ; then - partial_active="--partial" + vgchange_options="$vgchange_options --partial" fi - ocf_run vgchange -a $active_mode $partial_active $1 || return $OCF_ERR_GENERIC + ocf_run vgchange $vgchange_options $1 || return $OCF_ERR_GENERIC if LVM_status $1; then : OK Volume $1 activated just fine! -- 1.7.5.4 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH] ldirectord: Remove dependence on shellfuncs from /etc/init.d/ldirectord
On 11/23/11 08:52, Simon Horman wrote: > On Wed, Nov 23, 2011 at 04:10:46PM +0900, TATEISHI Katsuyuki wrote: >> Hi All, >> >> Please consider applying the attached patch to remove dependence on >> shellfuncs from /etc/init.d/ldirectord. >> >> It allows ldirectord-RPM users to run /etc/init.d/ldirectord without >> resource-agents RPM installed. >> >> It is quite harmless, because /etc/init.d/ldirectord does not use any >> functions in /etc/ha.d/shellfuncs. >> >> Thank you, > > Hi Tateishi-san, > > this looks good to me. > > Dejan, could you apply it? I had just merged Mori-san's Makefile.am patch, so I picked up this one as well. Merged and pushed, with a slightly modified commit message: https://github.com/ClusterLabs/resource-agents/commit/6c2e8146c757da47b7ff926edb6895f7b1832e55 Thanks! Cheers, Florian ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Review request: fixes to IPaddr2
On 11/21/11 16:51, Dejan Muhamedagic wrote: > Hi Florian, > > On Fri, Nov 11, 2011 at 10:28:00AM +0100, Florian Haas wrote: >> Dejan/Lars, >> >> I noticed I've had a bunch of minor changes to IPaddr2 sitting in a >> branch since July, and never got around to asking for a review or >> merging them. I've just rebased them to the current state of master. If >> one of you could take a look, I'd much appreciate that. Thanks! >> >> https://github.com/fghaas/resource-agents/compare/master...ipaddr2-fixes > > > Reviewed all the changes and didn't find any problem. Good to > put a bit more order in IPaddr2! Thanks! Merged & pushed. https://github.com/ClusterLabs/resource-agents/commit/881f539aafa1e0198144137bae9e3491c45f7cd1 Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Updated OCF RA dev guide (was Re: [GIT PULL] Moving the OCF RA dev guide to the resource-agents repo)
On 11/18/11 10:50, Florian Haas wrote: > Done. Merged and pushed. I'll now add a few updates and then rebuild the > content hosted at > http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html. I guess I > should be able to upload the new stuff some time between now and Monday > morning. An updated version of the OCF RA developer's guide is now available here: A related blog post of mine is at: http://wp.me/p4XzQ-bo Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [GIT PULL] Moving the OCF RA dev guide to the resource-agents repo
On 11/16/11 10:37, Florian Haas wrote: > Hi everyone, > > this is something I've been meaning to do for a long time, and I've > finally had the time to do so. Now that the ClusterLabs repo on Github > has been established as the central source of OCF resource agents, there > is really no reason why the RA dev guide should live in the linux-ha.org > Mercurial repo any longer. I've informally proposed to move it to Github > on IRC and received very positive feedback on that. > > I've done some mangling with git and hg to preserve the guide's entire > history, so this pull request is a bit bloated because of that. > > So as not to clutter the doc/ directory, I've created two subdirs, > "doc/dev-guide" (for the dev guide sources) and "doc/man" (for the magic > that does man page autogeneration for RAs). I've updated configure.ac > and the Automake files accordingly. > > If I don't hear from anyone with a valid objection to moving this over > to Github, I'll proceed with the merge about 48 hours from now. Done. Merged and pushed. I'll now add a few updates and then rebuild the content hosted at http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html. I guess I should be able to upload the new stuff some time between now and Monday morning. For those building resource agent man pages in a git checkout, please remember to rerun ./autogen.sh and ./configure (or run autoreconf) as the generated man pages (and their Automake file) have moved from doc/ to doc/man/. Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [GIT PULL] Re: [Linux-HA] [RfC] Review request for ocf:heartbeat:asterisk (Asterisk OCF RA)
On 11/16/11 10:28, Florian Haas wrote: > Hi everybody, > > barring any last-minute vetoes, I intend to pull the following changes > since commit 020c8f7b08e232aef05e277b09632171a7561744: Heard no vetoes. Merged and pushed. https://github.com/ClusterLabs/resource-agents/commit/ff3aff0006368dcb5cf7da226ee69a8c53b4ef62 Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [GIT PULL] Moving the OCF RA dev guide to the resource-agents repo
Hi everyone, this is something I've been meaning to do for a long time, and I've finally had the time to do so. Now that the ClusterLabs repo on Github has been established as the central source of OCF resource agents, there is really no reason why the RA dev guide should live in the linux-ha.org Mercurial repo any longer. I've informally proposed to move it to Github on IRC and received very positive feedback on that. I've done some mangling with git and hg to preserve the guide's entire history, so this pull request is a bit bloated because of that. So as not to clutter the doc/ directory, I've created two subdirs, "doc/dev-guide" (for the dev guide sources) and "doc/man" (for the magic that does man page autogeneration for RAs). I've updated configure.ac and the Automake files accordingly. If I don't hear from anyone with a valid objection to moving this over to Github, I'll proceed with the merge about 48 hours from now. Cheers, Florian The following changes since commit 020c8f7b08e232aef05e277b09632171a7561744: Medium: jboss: add the java_opts parameter for java options (2011-11-15 16:27:26 +0100) are available in the git repository at: git://github.com/fghaas/resource-agents dev-guide Dejan Muhamedagic (1): RA dev guide: edit ocft description Florian Haas (68): doc: move man page generation to doc/man Add RA developer's guide RA dev guide: explain API RA dev guide: add info about expected behavior for actions RA dev guide: add information about pseudo resources RA dev guide: add information about logging, ocf_run, and locks RA dev guide: add info about script interpreters RA dev guide: more information about pseudo resources RA dev guide: add info about testing for booleans and numbers RA dev guide: add separate section on convenience functions RA dev guide: explain have_binary and check_binary RA dev guide: explain validate-all and meta-data RA dev guide: do not use full paths when referring to executables RA dev guide: update example metadata RA dev guide: add line breaks in example metadata RA dev guide: add stubs for remaining actions RA dev guide: add section on variables RA dev guide: add stubs for initialization and locale considerations RA dev guide: add info about promote action RA dev guide: explain demote action RA dev guide: explain notify actions RA dev guide: explain migrate_to operation RA dev guide: explain migrate_from actions Add information about licensing, initialization, and locale settings RA dev guide: add hints for syntax highlighting RA dev guide: fix typo RA dev guide: fix incorrect debug message in example code RA dev guide: fix a misleading comment RA dev guide: fix example code for stop action RA dev guide: fix incorrect function name RA dev guide: fix trivial typo RA dev guide: add information about crm_master RA dev guide: rewrite monitor example RA dev guide: add convenience function names to the corresponding section headers RA dev guide: add legal notice for CC-BY-SA license RA dev guide: add note about ocf_is_true RA dev guide: add new section on resource agent structure RA dev guide: fix typos (migrate-*/migrate_*) RA dev guide: fix typo (missing "it") RA dev guide: improve stop example RA dev guide: fix comments in demote example RA dev guide: mention the checkbashisms script for /bin/sh RAs RA dev guide: fix misleading comment RA dev guide: add section on testing for running processes RA dev guide: put return codes in a separate section RA dev guide: add warning about implications of failed stop actions RA dev guide: clarify information on timeouts RA dev guide: add note about HA_RSCTMP being cleaned on startup RA dev guide: rename "Resource agent behavior" to "Resource agent actions" RA dev guide: add a section about ocf-tester RA dev guide: move testing to a different section RA dev guide: add section on installing resource agents RA dev guide: add section on RPM packaging RA dev guide: expand license information RA dev guide: add section on Debian packaging RA dev guide: add note about submitting RAs RA dev guide: fix names of notify variables RA dev guide: add another note about crm_master RA dev guide: put author information in docinfo file RA dev guide: add revision information to docinfo file RA dev guide: remove information about superfluous tests for zombies RA dev guide: add revision 1.0.1, update copyright years RA dev guide: add "Conventions" section RA dev guide: put testing into its own top-level section RA dev guide: add section on ocft
[Linux-ha-dev] [GIT PULL] Re: [Linux-HA] [RfC] Review request for ocf:heartbeat:asterisk (Asterisk OCF RA)
Hi everybody, barring any last-minute vetoes, I intend to pull the following changes since commit 020c8f7b08e232aef05e277b09632171a7561744: Medium: jboss: add the java_opts parameter for java options (2011-11-15 16:27:26 +0100) into the ClusterLabs resource-agents repo from the git repository at: git://github.com/fghaas/resource-agents asterisk Thanks a lot to Dejan, Lars and Russell for their extensive and valuable feedback. Cheers, Florian Andreas Kurz (2): Medium: asterisk: remove -x option from pgrep Low: asterisk: refine sipsak exit code interpretation Florian Haas (24): Low: asterisk: use ocf_run where appropriate, invoke kill with "-s" Low: asterisk: honor $PATH in binary defaults Low: asterisk: don't advertise monitor depth Low: asterisk: remove LSB boilerplate Low: asterisk: don't redirect ocf_run invocations to /dev/null Low: asterisk: remove superfluous line-end semicolons Low: asterisk: simplify equality tests Low: asterisk: remove boilerplate copied from mysql Low: asterisk: downgrade logging severity for successful monitor Low: asterisk: improve directory creation on start Low: asterisk: improve/simplify start Low: asterisk: improve stop Low: asterisk: declare remaining "pid" variables local Low: asterisk: fix typo in log message Low: asterisk: remove useless return statement Low: asterisk: religiously use $rc, not $? Low: asterisk: exit, don't return, in the face of uncaught errors Low: asterisk: remove unused variable Low: asterisk: initialize convenience variables after validate Low: asterisk: add optional SIP monitoring with sipsak Low: asterisk: do "core show channels count " during monitor Low: asterisk: whitespace cleanup High: asterisk: fix typo (missing "$") Low: asterisk: remove -v flag from sipsak invocation Martin Loschwitz (6): High: asterisk: new resource agent Low: asterisk: set suggested timeouts to 20s Low: asterisk: add convenience function for connecting to Asterisk console Low: asterisk: rewrite stop operation Low: asterisk: cleanup Medium: asterisk: properly handle astcanary doc/Makefile.am |1 + heartbeat/Makefile.am |1 + heartbeat/asterisk| 485 + 3 files changed, 487 insertions(+), 0 deletions(-) create mode 100755 heartbeat/asterisk ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] ocf_run: sanitize output before logging?
On 2011-11-15 16:21, Dejan Muhamedagic wrote: > Hi, > > On Mon, Nov 14, 2011 at 09:53:12PM +0100, Florian Haas wrote: >> Dejan, Lars, and other shell gurus in attendance, >> >> maybe I'm totally off my rocker, and one of you guys can set me >> straight. But to me this part of the ocf_run function seems a bit fishy: >> >> output=`"$@" 2>&1` >> rc=$? >> output=`echo $output` > >> Am I gravely mistaken, or would any funny control characters produced by >> the wrapped command line totally mess up the content of "output" here as >> it is mangled by the backticks? > > I think you're not :) The last line was most probably put there > to convert CR to spaces. >> $ output=`sipsak -v -s sip:somenotexistantextens...@ekiga.net 2>&1` >> $ echo $output >> Content-Length: 0(1.5.3-notls >> (i386/linux))tag=c64e1f832a41ec1c1f4e5673ac5b80f6.8ff585.127.155.32 > > Seems like part of the output goes to stdout and another part to > stderr. The two are interspersed in an unpredictable manner. Unlikely. If I do $ output=`sipsak -v -s sip:somenotexistantextens...@ekiga.net 2>&1` $ xxd <<< $output ...then the output is in the hexdump in exactly the right order. It's just delimited by CR (0x0d), not LF (0x0a). Which is mighty odd for a utility running on any *nix platform, but still shouldn't be transformed to something thus garbled, simply by being stuffed into a variable. For now, I guess we'll wimp out and simply remove "-v" from the sipsak invocation so it just doesn't produce any output, in which case ocf_run falls back to logging just the command and its exit code. Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] ocf_run: sanitize output before logging?
Dejan, Lars, and other shell gurus in attendance, maybe I'm totally off my rocker, and one of you guys can set me straight. But to me this part of the ocf_run function seems a bit fishy: output=`"$@" 2>&1` rc=$? output=`echo $output` Am I gravely mistaken, or would any funny control characters produced by the wrapped command line totally mess up the content of "output" here as it is mangled by the backticks? What I'm noticing is the invocation of "ocf_run sipsak -v -s ", which we put into the asterisk RA as per Russell Bryant's suggestion, seems to totally garble the output. Compare this: $ sipsak -v -s sip:somenotexistantextens...@ekiga.net 2>&1 SIP/2.0 200 OK Via: SIP/2.0/UDP 127.0.0.1:43665;branch=z9hG4bK.539207ad;rport=53485;alias;received=85.127.155.32 From: sip:sipsak@127.0.0.1:43665;tag=6dafacb9 To: sip:somenotexistantextens...@ekiga.net;tag=c64e1f832a41ec1c1f4e5673ac5b80f6.3109 Call-ID: 1840229561@127.0.0.1 CSeq: 1 OPTIONS Server: Kamailio (1.5.3-notls (i386/linux)) Content-Length: 0 To this: $ output=`sipsak -v -s sip:somenotexistantextens...@ekiga.net 2>&1` $ echo $output Content-Length: 0(1.5.3-notls (i386/linux))tag=c64e1f832a41ec1c1f4e5673ac5b80f6.8ff585.127.155.32 In this case it appears to be due to carriage-return (0x0d, ^M) characters that sipsak injects into its output, which is annoying but relatively benign. But maybe we want to sanitize the ocf_run output before we hand it off to be written to the logs? Cheers, Florian ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH] prevent Slave promotion in mysql RA
On 2011-11-14 20:44, Marek Marczykowski wrote: > On 14.11.2011 09:55, Raoul Bhatia [IPAX] wrote: >> On 2011-11-11 09:22, Junko IKEDA wrote: >>> Hi, >>> >>> I am running MySQL replication setting with 2 nodes Master/Slave >>> configuration. >>> If Slave status(secs_behind) is lager than Master's >>> parameter(max_slave_lag), >>> Slave data is outdated, right? >>> check_slave() in mysql RA would run "crm_master -v 0" in this >>> situation to mark Slave as "outdated", >>> but if Master is shut down in this status, >>> Slave will be able to promote instead of its old data. >>> (is this correct?) >>> It seems that "crm_master -v -INFINITY" is effectual to prevent Slave >>> promotion. >> >> forwarding this to marek and fghaas as i'm not familiar with >> multi-state handling inside resource agents. > > Did you set "evict_outdated_slaves"? In a master/slave set, evict_outdated_slaves will actually kick out (by failing with $OCF_ERR_INSTALLED) any slave that has fallen behind. If set to false (the default), then the slave will be allowed to stay in the cluster, but its master preference will be pushed down so it's not promoted, and this seems to be Ikeda-san's preferred behavior. The caveat which I mentioned in my other email in this thread applies here, though. For those pulling this thread from the archives: this information is in the resource agent man page, and in "crm ra info ocf:heartbeat:mysql". Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH] prevent Slave promotion in mysql RA
Hello Ikeda-san! On 2011-11-11 09:22, Junko IKEDA wrote: > Hi, > > I am running MySQL replication setting with 2 nodes Master/Slave > configuration. > If Slave status(secs_behind) is lager than Master's parameter(max_slave_lag), > Slave data is outdated, right? Yes. > check_slave() in mysql RA would run "crm_master -v 0" in this > situation to mark Slave as "outdated", > but if Master is shut down in this status, > Slave will be able to promote instead of its old data. > (is this correct?) Actually, not quite. :) Andrew will correct me if I'm wrong on this. But as I understand it, - while a _placement score_ of 0 makes a node eligible for _running_ a resource (including an instance of a master/slave set), - only a _promotion score_ or greater than 0 (i.e. a minimum of 1) makes the node eligible for promoting a resource to the Master role. So, if a node has a promotion score of 0, then it will node be promoted. However, your point is entirely valid if you also set a master preference via a location constraint on the master role. Consider this: node alice attributes standby="on" node bob attributes standby="off" primitive p_mysql ocf:heartbeat:mysql ms ms_myql p_mysql location l_master_prefers_bob ms_mysql \ rule 200: $role=Master #uname eq bob In that case, if bob has fallen too far behind (automatic master score: 0), then the location rule still increases that score by 200, so the total promotion score for bob is 0 + 200 == 200, and bob will be promoted. > It seems that "crm_master -v -INFINITY" is effectual to prevent Slave > promotion. Yes, that is entirely correct. In the example above, if the outdated slave sets a promotion preference of -INFINITY, them the total promotion score would be -INFINITY + 200 == -INFINITY. So the outdated slave, bob, would never be promoted to master. But: > if [ $master_pref -lt 0 ]; then > # Sanitize a below-zero preference to just zero > master_pref=0 > > fi > $CRM_MASTER -v $master_pref This if block is unfortunately there for a good reason, namely that (at least some versions back) the Policy Engine really did not like negative promotion scores at all. I forget the exact details, but maybe Lars (Ellenberg) will remember -- I seem to recall him telling me very firmly something to the effect of "whatever you do, don't use crm_master with a negative score anywhere". Now, it may be that said issues with the pengine have since been fixed. If that is the case, I'll be happy to modify the mysql RA as you suggest. Surely you have patched your local version of the RA to set a -INFINITY master preference. If so, does it behave as you expect it? If yes, could you test it on both a 1.1 and a 1.0 cluster? Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Patches for VirtualDomain RA
On 2011-11-11 11:42, Michael Schwartzkopff wrote: >>> 2) The next problem is that a graceful shutdown sometimes does not work >>> when the machine just booted. This patch makes the RA send a shutdown >>> command every 10 seconds while shutting down the machine. This catches >>> the boot problem. >>> >>> @@ -234,6 +240,9 @@ >>> >>> shutdown_timeout=$((($OCF_RESKEY_CRM_meta_timeout/1000)-5 >>> )) # Loop on status for $shutdown_timeout seconds >>> for i in `seq $shutdown_timeout`; do >>> >>> + if [ $((i%10)) -eq 0 ]; then >>> + virsh $VIRSH_OPTIONS shutdown ${DOMAIN_NAME} >>> + fi >>> >>> VirtualDomain_Status >>> status=$? >>> case $status in >> >> I see the point -- if you're issuing a KVM shutdown while the machine is >> still booting and the guest's acpid is not started, then the shutdown >> effectively doesn't happen. And issuing a shutdown request for a domain >> that's already got one should do no harm. >> >> Question is, why only do this every 10 seconds then? Might as well do it >> on every iteration. So we could just roll the invocation of "virsh >> $VIRSH_OPTIONS shutdown ${DOMAIN_NAME}" into the existing "while [ $NOW >> -lt $shutdown_timeout ]; do" loop. >> >> What do others think? > > Perhaps the shutdown might cause a considerably load on the system. Why? Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Patches for VirtualDomain RA
On 2011-07-29 10:22, Michael Schwartzkopff wrote: > Hi, > > I hope I found the correct list. Playing with the VirtualDomain RA I found > two > problems. Please find the description and patches below. Sorry for not tending to this for a while, and thanks to Dejan for the reminder. > 1) During stop operation libvirt occasionally returns an error because the > state cannot be determined just the moment the machine is shut down. This > patch makes the RA try to get the state again one time. If the machine is > down > then everything is OK. > > --- /root/VirtualDomain 2011-07-29 08:39:30.652675972 +0200 > +++ /usr/lib/ocf/resource.d/heartbeat/VirtualDomain 2011-07-29 > 10:08:24.712790703 +0200 > @@ -149,6 +149,7 @@ > VirtualDomain_Status() { > rc=$OCF_ERR_GENERIC > status="no state" > +bail_wait="yes"; > while [ "$status" = "no state" ]; do > status="`virsh $VIRSH_OPTIONS domstate $DOMAIN_NAME`" > case "$status" in > @@ -177,8 +178,13 @@ > # During the stop operation, we want to bail out > # quickly, so as to be able to force-stop (destroy) > # the domain if necessary. > - ocf_log error "Virtual domain $DOMAIN_NAME has no state > during > stop operation, bailing out." > - return $OCF_ERR_GENERIC; > + ocf_log info "Virtual domain $DOMAIN_NAME has no state > during > stop operation." > + if [ "$bail_wait" = "no" ]; then > + ocf_log error "Virtual domain $DOMAIN_NAME has no > state > during stop operation, bailing out." > + return $OCF_ERR_GENERIC; > + fi > + bail_wait="no" > + sleep 1 > else > # During all other actions, we just wait and try > # again, relying on the CRM/LRM to time us out if Can you please configure your mail agent to not insert line breaks when you send patches? Better still, use git send-email. At any rate, I consider the patch obsolete (and actually, it was already when it was submitted), as Lars Ellenberg implemented a "try this three times" logic in commit ffc83235, on July 1, 2010: https://github.com/ClusterLabs/resource-agents/commit/ffc8323515c19bc51fe0801fc3d2610878699ce3 > 2) The next problem is that a graceful shutdown sometimes does not work when > the machine just booted. This patch makes the RA send a shutdown command > every > 10 seconds while shutting down the machine. This catches the boot problem. > > @@ -234,6 +240,9 @@ > shutdown_timeout=$((($OCF_RESKEY_CRM_meta_timeout/1000)-5)) > # Loop on status for $shutdown_timeout seconds > for i in `seq $shutdown_timeout`; do > + if [ $((i%10)) -eq 0 ]; then > + virsh $VIRSH_OPTIONS shutdown ${DOMAIN_NAME} > + fi > VirtualDomain_Status > status=$? > case $status in I see the point -- if you're issuing a KVM shutdown while the machine is still booting and the guest's acpid is not started, then the shutdown effectively doesn't happen. And issuing a shutdown request for a domain that's already got one should do no harm. Question is, why only do this every 10 seconds then? Might as well do it on every iteration. So we could just roll the invocation of "virsh $VIRSH_OPTIONS shutdown ${DOMAIN_NAME}" into the existing "while [ $NOW -lt $shutdown_timeout ]; do" loop. What do others think? Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Review request: fixes to IPaddr2
Dejan/Lars, I noticed I've had a bunch of minor changes to IPaddr2 sitting in a branch since July, and never got around to asking for a review or merging them. I've just rebased them to the current state of master. If one of you could take a look, I'd much appreciate that. Thanks! https://github.com/fghaas/resource-agents/compare/master...ipaddr2-fixes Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] [RfC] Review request for ocf:heartbeat:asterisk (Asterisk OCF RA)
Just FYI, I noticed I erroneously put the asterisk changes in the master branch on my github repo; I've now moved them to a separate "asterisk" branch. The direct links to commits, which I posted earlier, should still work as the SHA IDs are unchanged. They just point to commits in a different branch now. For those just tuning in, the current state of the RA is here: https://github.com/fghaas/resource-agents/blob/asterisk/heartbeat/asterisk Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] [RfC] Review request for ocf:heartbeat:asterisk (Asterisk OCF RA)
On 2011-11-10 17:08, Dejan Muhamedagic wrote: > Hi Lars, > > Pity I didn't see this earlier, could've saved meself some time :) > > On Thu, Nov 10, 2011 at 04:33:02PM +0100, Lars Ellenberg wrote: >> On Thu, Nov 10, 2011 at 04:11:16PM +0100, Florian Haas wrote: >>> Hi Dejan, >>> >>> thanks for the feedback! We've worked in most of your suggested changes, >>> see below: >> >>>> More direct would be: >>>> >>>> if [ $? -ne 0 ]; then >> >> $? in a test is almost always an error. > > Unless you don't need the outcome later. I'm with you, however Lars did evidently spot the one occasion where we used $? twice trying to get the same return code. So he wins. :) https://github.com/fghaas/resource-agents/commit/2cbb26648c133ce04b0d51e439c41541dac039e1 I left one invocation in there: "asterisk_validate || exit $?". I hope that one is acceptable. :) >> Btw, >> "User $OCF_RESKEY_user doesn't exit" >> there is an s missing. > > "user doesn't exit" sounds good too ;-) https://github.com/fghaas/resource-agents/commit/1568a990dfcac03ddfe5785c5d65940ed230068c We also tossed in a few more changes: Properly handle multiple instances of the "astcanary" watchdog daemon: https://github.com/fghaas/resource-agents/commit/c257d4a57f9131a4353143991fe101f02b51d790 Remove a pointless "return $?" https://github.com/fghaas/resource-agents/commit/812f2b55b9a7a9f8bdd270cca8d037f07f0e980a Fix a couple of log messages, and exit where we shouldn't return: https://github.com/fghaas/resource-agents/commit/58b5f55da4ac5537acf9a36b56f8b35f2b96da56 Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] [RfC] Review request for ocf:heartbeat:asterisk (Asterisk OCF RA)
Hi Dejan, thanks for the feedback! We've worked in most of your suggested changes, see below: On 2011-11-10 13:14, Dejan Muhamedagic wrote: > Hi, > > On Thu, Nov 10, 2011 at 10:27:36AM +0100, Florian Haas wrote: >> On 2011-11-09 12:02, Martin Gerhard Loschwitz wrote: >>> Hello everybody, >>> >>> I wrote an asterisk OCF resource agent which I am hereby putting up >>> for discussion. Any feedback is welcome. >>> >>> It's available from >>> https://github.com/fghaas/resource-agents/blob/master/heartbeat/asterisk >> >> Let's move this thread to the -dev list where it really belongs. >> >> FWIW, I consider this RA in pretty good shape -- I did review it rather >> thoroughly and sent a few patches, for which Martin was kind enough to >> include me, undeservingly, in the authors list. Feedback from others >> would still be very much appreciated (even if it's just a "+1 for >> merge"). Thanks! > > Just a few remarks: > > Ending commands with ';' is not necessary: > > return $OCF_ERR_INSTALLED; > > i.e. ';' serves as a command separator. (:%s/;$//) https://github.com/fghaas/resource-agents/commit/088ba39b855d4ca6375a17500aa0c0e1a2578db8#heartbeat/asterisk > This construct looks a bit unusual: > > if [ ! $? -eq 0 ]; then > > More direct would be: > > if [ $? -ne 0 ]; then https://github.com/fghaas/resource-agents/commit/bbe7a0ba38d366b25067b09141208087a9e44850#heartbeat/asterisk > Is this necessary (in asterisk_status): > > if [ -d /proc -a -d /proc/1 ]; then > [ "u$pid" != "u" -a -d /proc/$pid ] > else > ocf_run kill -s 0 $pid > fi > > Why not just: > > kill -s 0 $pid https://github.com/fghaas/resource-agents/commit/d77afe185d9cec53388c2248ce3b290f95e4cad5 > Line 273 in monitor is going to produce a lot of logging, better > reduce severity to debug: > > ocf_log info "Asterisk PBX monitor succeeded"; https://github.com/fghaas/resource-agents/commit/cf130ce1a3d1b9502ad07df6361f611a256ee560#heartbeat/asterisk > In asterisk_start() $ASTRUNDIR is first created using install(8), > then again checked in lines 292-296 and created using mkdir, > chown, etc. Superfluous. > > Start may exit with some arbitrary error code (line 324 in > asterisk_start()). > > Perhaps to move all local statements to the top of the function > in asterisk_start(). > > [nitpicking] start_wait is not needed, why not just > > [ $rc -eq $OCF_SUCCESS ] && break https://github.com/fghaas/resource-agents/commit/80ea432336a56cf2680f30c389b06c20f43eef79#heartbeat/asterisk > Should check content of $pid before line 377 in stop. I'm unsure what you're suggesting here. - Just check that the pid file is non-empty? - Check whether its contents are numeric? - Read the pid and do a kill -0 before kill -TERM? Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] [RfC] Review request for ocf:heartbeat:asterisk (Asterisk OCF RA)
On 2011-11-09 12:02, Martin Gerhard Loschwitz wrote: > Hello everybody, > > I wrote an asterisk OCF resource agent which I am hereby putting up > for discussion. Any feedback is welcome. > > It's available from > https://github.com/fghaas/resource-agents/blob/master/heartbeat/asterisk Let's move this thread to the -dev list where it really belongs. FWIW, I consider this RA in pretty good shape -- I did review it rather thoroughly and sent a few patches, for which Martin was kind enough to include me, undeservingly, in the authors list. Feedback from others would still be very much appreciated (even if it's just a "+1 for merge"). Thanks! Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Stonith turns node names to lowercase
On 2011-10-18 12:12, Alberic de Pertat wrote: > Hi, > > I am currently in the process of writing a fencing agent for VMware > vCenter. After some tests, I noticed that the stonith command is turning > the nodename to lowercase. > > The problem is that almost every VM in my inventory is uppercase with > some mixed case too. VMware allows you to have two VM with the same name > but different cases. Gah. That sounds like a great way to go insane, or drive someone else nuts. I really wonder why on earth anyone would want to do this. > I cannot find a proper way to deal with this as a > case insensitive search through the inventory could yield more than one > result. > > Looking through the stonith command source (Mercurial HEAD), I found the > following in stonith.c (l. 456) : > > g_strdown(nodecopy); > > Is there a reason for this ? I suppose Dejan will accept a patch making this configurable. Cheers, Florian -- Need help with fencing? http://www.hastexo.com/now ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] [ha-wg] CFP: HA Mini-Conference in Prague on Oct 25th
Hi everyone, I can finally respond to this and do want to take the opportunity to apologize for my silence-for-obvious-reasons over the past month. On 2011-10-07 00:05, Andrew Beekhof wrote: > On Thu, Oct 6, 2011 at 1:53 AM, Lars Marowsky-Bree wrote: >> On 2011-10-03T11:10:13, Andrew Beekhof wrote: >> >>> Based on Boston last year, I imagine the conversations will last right >>> up until Lars starts presenting his talk on Friday afternoon. >>> People came and went at random, and if someone essential was missing >>> for a conversation we deferred it until later. >> >> Oh, then we're going to not stop, ever - because I don't have a talk at >> the main conference this time ;-) > > The schedule has you in a friday afternoon slot iirc. That one is actually the talk by Madison and yours truly. Regarding my own plans: when the originally planned miniconf was canceled, I committed to speaking at Percona Live in London that same week, so I will be late for Linuxcon and arrive in the late evening of the 26th. I'm completely open for Boston-style round table sessions all day on the 27th, and also in the morning and evening of the 28th. Minus talk prep with Madison, of course. I am also not handing back down to Vienna before the early afternoon of Saturday the 29th, so if anyone has plans to do something interesting that Saturday morning I'd be more than happy to join. Cheers, Florian ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Postfix status
On 09/08/11 10:34, Raoul Bhatia [IPAX] wrote: > On 09/08/2011 04:49 AM, renayama19661...@ybb.ne.jp wrote: >> do not apply a patch even if you apply this patch, there is not the big >> problem. >> I am lacking in my explanation, and I'm sorry. > > ok. i just updated my pull request. > > https://github.com/ClusterLabs/resource-agents/pull/20 > > dejan, can you please review and apply our patches? Taking the liberty to step in for Dejan, I've merged and pushed your changes. Thanks for your contribution! Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] pacemaker - migrate RA, based on the state of other RA, w/o clone?
On 2011-07-14 12:55, RNZ wrote: > > > On Thu, Jul 14, 2011 at 2:02 PM, Florian Haas <mailto:florian.h...@linbit.com>> wrote: > > On 2011-07-14 08:46, RNZ wrote: > > No, I want and I need - multi-master scheme (more then two nodes)... > > There is nothing in Pacemaker's master/slave scheme that restricts you > to a single master. The ocf:linbit:drbd resource agent, for example, is > configurable in dual-Master mode. > > Once the resource agent properly implements the functionality (the hard > part), configuring a multi-master master/slave set is simply a question > of setting the master-max meta parameter to a value greater than 1 (the > easy part). > > I don't think so... Couchdb RESTful API very easy allow running > repliacate by next scheme: It's entirely possible that the couchdb native API may be more powerful in specific regards, but if you want to put it into a Pacemaker cluster you may have to occasionally accept some minor limitations. That's a tradeoff which is present for all Pacemaker managed applications. > primitive cdb0 > hostA: hostB:dbB > localhost:dbB > hostA: hostC:dbC > localhost:dbC > hostA: hostD:dbD > localhost:dbD > primitive cdb1 > hostB: hostA:dbB > localhost:dbB > primitive cdb2 > hostC: hostA:dbC > localhost:dbC > > In this scheme hostA used as master for hostB and hostC (master-master) > and as slave for hostD (slave-master). Both (master-master and > slave-master for different servers/databases) scheme per one instance. So you mean there would be a cascading replication, like so: hostD | hostA / \ hostB hostC Such a thing is not something Pacemaker caters for specifically, but I dare say it doesn't need to, either. You would simply create one master/slave set where D is master and A is slave, and another where A is master and B and C are slaves. > By the way, is there any specific reason you are contributing under a > pseudonym? It's highly unusual in this community to do so. > > > Sorry, habit... My real name Alibek.Amaev, alibek.am...@gmail.com > <mailto:alibek.am...@gmail.com> or alibe...@gmail.com > <mailto:alibe...@gmail.com> Pleased to meet you Alibek, welcome to the tribe. :) Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] pacemaker - migrate RA, based on the state of other RA, w/o clone?
On 2011-07-14 08:46, RNZ wrote: > No, I want and I need - multi-master scheme (more then two nodes)... There is nothing in Pacemaker's master/slave scheme that restricts you to a single master. The ocf:linbit:drbd resource agent, for example, is configurable in dual-Master mode. Once the resource agent properly implements the functionality (the hard part), configuring a multi-master master/slave set is simply a question of setting the master-max meta parameter to a value greater than 1 (the easy part). By the way, is there any specific reason you are contributing under a pseudonym? It's highly unusual in this community to do so. Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [GIT PULL] IPaddr2 cleanup
Hi everyone, please review the following changes since commit dca8808361fc2d130e44d3e1ebc1c5ff38fbf9ac: Low: Route: insert paragraph breaks in longdesc (2011-07-04 14:32:17 +0200) in the git repository at: git://github.com/fghaas/resource-agents ipaddr2-fixes These should not introduce any functional changes, just a general cleanup and streamlining along current best practices, hopefully making the RA easier to maintain. The updated RA passes all ocft tests. Everyone's input is much appreciated. Thanks! Cheers, Florian Florian Haas (7): Low: IPaddr2: improve add_interface function Low: IPaddr2: sanitize defaults initialization Low: IPaddr2: open-code references to resource parameters Low: IPaddr2: use ocf_is_true when evaluating lvs_support parameter Low: IPaddr2: use ocf_is_true when evaluating arp_bg parameter Low: IPaddr2: remove falsely advertised default Low: ocft: remove "InstallPackage" line for IPaddr2 heartbeat/IPaddr2 | 146 ++-- tools/ocft/IPaddr2 |1 - 2 files changed, 73 insertions(+), 74 deletions(-) signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] regressions in resource-agents 3.9.1
On 2011-06-22 11:48, Dejan Muhamedagic wrote: > Hello all, > > Unfortunately, it turned out that there were two regressions in > the 3.9.1 release: > > - iscsi on platforms which run open-iscsi 2.0-872 (see > http://developerbugs.linux-foundation.org/show_bug.cgi?id=2562) > > - pgsql probes with shared storage (iirc), see > http://marc.info/?l=linux-ha&m=130858569405820&w=2 > > Thanks to Vadym Chepkov for finding and reporting them. > > I'd suggest to make a quick fix release 3.9.2. > > Opinions? Agree. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Uniquness OCF Parameters
On 2011-06-16 10:51, Lars Ellenberg wrote: > On Thu, Jun 16, 2011 at 09:48:20AM +0200, Florian Haas wrote: >> On 2011-06-16 09:03, Lars Ellenberg wrote: >>> With the current "unique=true/false", you cannot express that. >> >> Thanks. You learn something every day. :) > > Sorry that I left off the "As you are well aware of," > introductionary phrase. ;-) In case that wasn't clear earlier, I was very much not aware of this. I wasn't being ironic, for a change. :) >>> Question is: do we really want or need that. >> >> That is a discussion for the updated OCF RA spec discussion, really. And >> the driver of that discussion is currently submerged. :) > > I guess this was @LMB? > Hey there ... do you read? :) He is on a diving vacation in Croatia. Not only was I not being ironic; I referred to his literal submersion. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Uniquness OCF Parameters
On 2011-06-16 09:03, Lars Ellenberg wrote: > With the current "unique=true/false", you cannot express that. Thanks. You learn something every day. :) > Depending on what we chose the meaning to be, > parameters marked "unique=true" would be required to > either be all _independently_ unique, > or be unique as a tuple. > > If we want to be able to express both, we need a different markup. > > Of course, we can move the markup out of the parameter description, > into an additional markup, that spells them out, > like . > > But using unique=0 as the current non-unique meaning, then > unique=, would > name the scope for this uniqueness requirement, > where parameters marked with the same such label > would form a unique tuple. > Enables us to mark multiple tuples, and individual parameters, > at the same time. > > Question is: do we really want or need that. That is a discussion for the updated OCF RA spec discussion, really. And the driver of that discussion is currently submerged. :) Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Uniquness OCF Parameters
On 2011-06-15 15:50, Alan Robertson wrote: > On 06/14/2011 06:03 AM, Florian Haas wrote: >> On 2011-06-14 13:08, Dejan Muhamedagic wrote: >>> Hi Alan, >>> >>> On Mon, Jun 13, 2011 at 10:32:02AM -0600, Alan Robertson wrote: >>>> On 06/13/2011 04:12 AM, Simon Talbot wrote: >>>>> A couple of observations (I am sure there are more) on the uniqueness >>>>> flag for OCF script parameters: >>>>> >>>>> Would it be wise for the for the index parameter of the SFEX ocf script >>>>> to have its unique flag set to 1 so that the crm tool (and others) would >>>>> warn if one inadvertantly tried to create two SFEX resource primitives >>>>> with the same index? >>>>> >>>>> Also, an example of the opposite, the Stonith/IPMI script, has parameters >>>>> such as interface, username and password with their unique flags set to >>>>> 1, causing erroneous warnings if you use the same interface, username or >>>>> password for multiple IPMI stonith primitives, which of course if often >>>>> the case in large clusters? >>>>> >>>> When we designed it, we intended that Unique applies to the complete set >>>> of parameters - not to individual parameters. It's like a multi-part >>>> unique key. It takes all 3 to create a unique instance (for the example >>>> you gave). >>> That makes sense. >> Does it really? Then what would be the point of having some params that >> are unique, and some that are not? Or would the tuple of _all_ >> parameters marked as unique be considered unique? >> > I don't know what you think I said, but A multi-part key to a database > is a tuple which consists of all marked parameters. You just said what > I said in a different way. > > So we agree. Jfyi, I was asking a question, not stating an opinion. Hence the use of a question mark. So then, if the uniqueness should be enforced for a "unique key" that is comprised of _all_ the parameters marked unique in a parameter set, then what would be the correct way to express required uniqueness of _individual_ parameters? In other words, if I have foo and bar marked unique, then one resource with foo=1 and bar=2, and another with foo=1, bar=3 does not violate the uniqueness constraint. What if I want both foo and bar to be unique in and of themselves, so any duplicate use of foo=1 should be treated as a uniqueness violation? Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Patch for pgsql
On 2011-06-15 14:26, Serge Dubrouski wrote: > I screwed up with git so here is the patch attached. Merged. I took the liberty to split this into two patches, and drop the spelling fix because the original spelling is actually correct. :) https://github.com/ClusterLabs/resource-agents/commit/f64c77a61ca4794ee636801b2447a2c1a6c531ce https://github.com/ClusterLabs/resource-agents/commit/2dd56104687b38006ae41d3aa033cc6f1cc41509 Thanks for the fixes. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Patch for pgsql
On 2011-06-15 14:26, Serge Dubrouski wrote: > I screwed up with git so here is the patch attached. Nice, thanks. Is the pgsql ocft test case OK as it is in the repo? Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Uniquness OCF Parameters
On 2011-06-14 13:08, Dejan Muhamedagic wrote: > Hi Alan, > > On Mon, Jun 13, 2011 at 10:32:02AM -0600, Alan Robertson wrote: >> On 06/13/2011 04:12 AM, Simon Talbot wrote: >>> A couple of observations (I am sure there are more) on the uniqueness flag >>> for OCF script parameters: >>> >>> Would it be wise for the for the index parameter of the SFEX ocf script to >>> have its unique flag set to 1 so that the crm tool (and others) would warn >>> if one inadvertantly tried to create two SFEX resource primitives with the >>> same index? >>> >>> Also, an example of the opposite, the Stonith/IPMI script, has parameters >>> such as interface, username and password with their unique flags set to 1, >>> causing erroneous warnings if you use the same interface, username or >>> password for multiple IPMI stonith primitives, which of course if often the >>> case in large clusters? >>> >> >> When we designed it, we intended that Unique applies to the complete set >> of parameters - not to individual parameters. It's like a multi-part >> unique key. It takes all 3 to create a unique instance (for the example >> you gave). > > That makes sense. Does it really? Then what would be the point of having some params that are unique, and some that are not? Or would the tuple of _all_ parameters marked as unique be considered unique? Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [ha-wg-technical] resource agents 3.9.1rc1 release
On 06/08/2011 03:06 AM, Dejan Muhamedagic wrote: > Hi, > > On Wed, Jun 08, 2011 at 10:50:17AM +0200, Fabio M. Di Nitto wrote: >> Hi, >> >> On 6/8/2011 10:16 AM, Keisuke MORI wrote: >>> Hi, >>> >>> Thank you for all your efforts for the new release. >>> >>> >>> 2011/6/7 Fabio M. Di Nitto : Several changes have been made to the build system and the spec file to accommodate both projects´ needs. The most noticeable change is the option to select "all", "linux-ha" or "rgmanager" resource agents at configuration time, which will also set the default for the spec file. >>> >>> Why is the ldirectord package disabled on RHEL environment? >>> I would expect that it would be built as same as (linux-ha) >>> resource-agents-1.0.4 >>> so that we can use the upcoming 3.9.1 as the upgrade version. >> >> Because ldirectord requires libnet to build and libnet is not available >> on default RHEL (unless you explicitly enable EPEL). >> >> Florian, last time we spoke, we were trying to avoid adding BR on >> packages that are not part of RHEL, but then to build linux-ha agents we >> need cluster-glue* that are not part of RHEL anyway. >> >> We should be consistent here. >> >> I am ok to allow people to build ldirectord. >> >>> >>> We still use the resource-agents/ldirectord on many RHEL systems and >>> if it was missing >>> we can not upgrade them anymore. >> >> Understood, we are still smoothing a few corners after the merge. It´s >> good people are spotting those bits. >> >>> >>> NOTE: About the 3.9.x version (particularly for linux-ha folks): This version was chosen simply because the rgmanager set was already at 3.1.x. In order to make it easier for distribution, and to keep package upgrades linear, we decided to bump the number higher than both projects. There is no other special meaning associated with it. The final 3.9.1 release will take place soon. >>> >>> BTW why not 4.0? :) >>> just curious though. >> >> There is really nothing major in this release vs 1.0.4 for linux-ha and >> 3.1.x for rgmanager agents, other than co-exist in the same tree. > > Actually, while looking at it, I'd also like something else > rather than 3.9.x. Can't put my finger on what's exactly the > issue, but something like 4.0 would somehow look better. Is it > only me? > >> We will probably use 4.0 to introduce the new OCF standard and the new >> common clusterlabs/ provider and mark effectively the introduction of >> new features. > > 4.1? I realize I'm bikeshedding, but my preference would be for 3.9 for this one, and 4.0 to implement the new standard. Like Fabio originally suggested. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [ha-wg-technical] resource agents 3.9.1rc1 release
On 06/08/2011 02:50 AM, Fabio M. Di Nitto wrote: > Hi, > > On 6/8/2011 10:16 AM, Keisuke MORI wrote: >> Hi, >> >> Thank you for all your efforts for the new release. >> >> >> 2011/6/7 Fabio M. Di Nitto : >>> Several changes have been made to the build system and the spec file to >>> accommodate both projects´ needs. The most noticeable change is the >>> option to select "all", "linux-ha" or "rgmanager" resource agents at >>> configuration time, which will also set the default for the >>> spec file. >> >> Why is the ldirectord package disabled on RHEL environment? >> I would expect that it would be built as same as (linux-ha) >> resource-agents-1.0.4 >> so that we can use the upcoming 3.9.1 as the upgrade version. > > Because ldirectord requires libnet to build and libnet is not available > on default RHEL (unless you explicitly enable EPEL). > > Florian, last time we spoke, we were trying to avoid adding BR on > packages that are not part of RHEL, but then to build linux-ha agents we > need cluster-glue* that are not part of RHEL anyway. > > We should be consistent here. > > I am ok to allow people to build ldirectord. No objection. >> We still use the resource-agents/ldirectord on many RHEL systems and >> if it was missing >> we can not upgrade them anymore. > > Understood, we are still smoothing a few corners after the merge. It´s > good people are spotting those bits. > >> >> >>> NOTE: About the 3.9.x version (particularly for linux-ha folks): This >>> version was chosen simply because the rgmanager set was already at >>> 3.1.x. In order to make it easier for distribution, and to keep package >>> upgrades linear, we decided to bump the number higher than both >>> projects. There is no other special meaning associated with it. >>> >>> The final 3.9.1 release will take place soon. >> >> BTW why not 4.0? :) >> just curious though. > > There is really nothing major in this release vs 1.0.4 for linux-ha and > 3.1.x for rgmanager agents, other than co-exist in the same tree. > > We will probably use 4.0 to introduce the new OCF standard and the new > common clusterlabs/ provider and mark effectively the introduction of > new features. I'd agree. Lars, it's now June, we have our final resource-agents release before we start actually merging functional code (as opposed to build systems), and we still don't even have coverage for deprecation in the OCF RA spec. Can I ask you to please either start working on that spec update or give up this task so someone else can pick it up? Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] MySQL 5.5 no longer supports CHANGE MASTER TO MASTER_HOST=''
On 2011-06-03 12:51, Ben Mildren wrote: > Hi all > > I've submitted a patch for the mysql resource agent. It currently will > error on MySQL 5.5+ as CHANGE MASTER TO MASTER_HOST='' has been deprecated. > > Within the unset_master function, I've supplied a dummy value as the > host name to ensure replication would fail to start if restarted > erroneously before the mysql service is restarted, and issued a RESET > SLAVE to remove the replication metadata after a restart. > > I've altered the is_slave function to add an extra check for the dummy > host name within the SHOW SLAVE STATUS output, as there is no longer a > way to stop SHOW SLAVE STATUS returning a resultset before the mysql > instance is restarted. Thanks Ben. Marek, thoughts on this? The pull request diff is here: https://github.com/ClusterLabs/resource-agents/pull/9/files Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Postfix status (was Re: state of heartbeat resource agents)
> Hi All, > > We found a problem in the resource agent of postfix. *Please* don't reply to an old thread if you mean to start a new one, hijacking threads just confuses everyone. > > The resource agent of postfix carries out /usr/sbin/postfix in status > parameter, but this is not available in old postfix. I believe this has been addressed in the latest patch set that was merged a couple of days ago; please try to reproduce the problem with the postfix RA from upstream git before you start working on your own patch. Thanks. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH] Low: fio: add missing log level
On 2011-06-01 16:26, r.bha...@ipax.at wrote: > From: Raoul Bhatia > > --- > heartbeat/fio |6 +++--- > 1 files changed, 3 insertions(+), 3 deletions(-) Merged, thanks. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] changelog for resource agents 3.9.x
On 2011-06-01 15:33, Raoul Bhatia [IPAX] wrote: > hello dejan! > > On 06/01/2011 02:19 PM, Dejan Muhamedagic wrote: >> Is it this pull request: >> >> https://github.com/ClusterLabs/resource-agents/pull/6 > yes > >> What about Florian's comment? > have been addressed. > >> Also, I cannot merge anything from there because commit lines >> don't specify the agent, just the level. For instance: >> >> Low: inform user that postfix stopped >> ... >> Low: at present, OCF_CHECK_LEVEL is not evaluated >> >> Can you please address this too. > > sure. how do i best update the changelog in this regard? git rebase -i The "reword" option in the interactive editor allows you to rephrase your commit messages. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] mysql RA fixes merged
Hello, I've merged and pushed a number of fixes to master/slave replication in the mysql RA, contributed by Marek Marczykowski. I've deliberately left out Raoul Bhatia's retab patch out though, those "janitor" patches usually make debugging harder if we do run into issues. We can always merge that patch later. I've also fixed the commit messages to be prefixed with "mysql". Marek, could you please rebase your github repo against current upstream. Thanks for the contribution! Recent commit history is here: https://github.com/ClusterLabs/resource-agents/commits/master/heartbeat/mysql Testing credit goes to Raoul and Dejan. Thanks to you too. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] lxc RA merged
Hello, after much useful testing from Christoph Mitasch and a number of necessary changes highlighted by ocf-tester, I've now merged and pushed the lxc resource agent that was originally contributed by Darren Thompson. The resource agent is here: https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/lxc Its commit history up to this point can be reviewed here: https://github.com/ClusterLabs/resource-agents/commits/master/heartbeat/lxc Hope this is useful. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] state of heartbeat resource agents
On 2011-05-24 13:38, Raoul Bhatia [IPAX] wrote: > postfix: > fghaas reviewed my code. i tried to catch him on irc on how to > progress with his comments. I tried to catch you that same day; you weren't there. How about falling back to the mailing list, or using github's line note interface to discuss? Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 90, Issue 8
Darren, On 2011-05-05 15:07, Florian Haas wrote: > On 2011-05-05 14:26, Darren Thompson wrote: >>> Can you confirm that the current version is working for you and passes >>> ocf-tester on your system? >> >> What is an ocf-tester??? > > http://www.linux-ha.org/doc/dev-guides/_testing_installing_and_packaging_resource_agents.html > >> I have been testing this "the hard way" by actually creating and running >> the agents against actual LXC containers in a running cluster... If >> there is a simple way of streamlining this testing I'd love to hear more >> about it. (Did I mention that I'm not normally a "coder/developer"? - >> Yes I know that's getting repetitive ;-) ) >> >> But, back on topic... I can confirm that this agent is working correctly >> in a "live fire" environment. > > That's good to know. ocf-tester doesn't shoot blanks either (it operates > on an actual incarnation of the resource), but it might run some tests > that you manually do not, so it's always a wise idea to use it. Any news regarding running ocf-tester on your lxc agent? Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] New OCF RA: symlink
On 2011-05-04 15:19, Florian Haas wrote: > On 2011-04-20 14:37, Florian Haas wrote: >> Dominik doesn't have a github repo yet, so I added this to a separate >> branch in mine. The current revision is here: >> >> https://github.com/fghaas/resource-agents/blob/symlink/heartbeat/symlink >> >> Please comment freely. Thanks! > > I've just responded to two comments from Lars and Alan and I'd > appreciate more. As of right now I don't see any show stoppers and I > wouldn't like to hold up Dominik's contribution much longer. > > Unless I hear any valid objections, I intend to merge this new RA next > Monday. Thanks! OK. I believe we did stir up some valuable discussion here, but I haven't seen anyone identify a real show stopper. Merged. Thanks Dominik! https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/symlink Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Filesystem ocf file
On 2011-05-06 09:58, Darren Thompson wrote: > Florian > > Ok then... I agree it does seem to be poorly designed and It's far from > intuitive... > > But If it's actually "correct" who am I to argue... "Correct" in the sense of "in line with the rules", not in the sense that it's actually smart. But yeah, it's a bullet we'll have to bite. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Filesystem ocf file
On 2011-05-06 09:26, Darren Thompson wrote: > Team > > I was reviewing some errors on a cluster mounted file-system that caused > me to review the Filesystem ocf file. > > I notice that it uses an "undeclared" parameter of "OCF_CHECK_LEVEL" to > determine what degree of testing of the filesystem is required in "monitor" > > I have now updated it to more formally work with a "check_level" value > with the more obvious values of "mounted, read & write" ( my updated > version attached ) > > Could someone (Florian is this something you can do?) please review this > with a view to patching the upstream Filesystem ocf file. NACK, sorry. The OCF_CHECK_LEVEL is specific to the monitor action and described as such in the OCF spec; this will not be changed without a change to the spec. To use it, set "op monitor interval=X OCF_CHECK_LEVEL=Y" Yes, it's poorly designed, it makes no sense why this is pretty much the only sensible time to set a parameter specifically for an operation (as opposed to on a resource), it's inexplicable why it's all caps, etc., but that's the way it is. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 90, Issue 8
On 2011-05-05 14:26, Darren Thompson wrote: >> Can you confirm that the current version is working for you and passes >> ocf-tester on your system? > > What is an ocf-tester??? http://www.linux-ha.org/doc/dev-guides/_testing_installing_and_packaging_resource_agents.html > I have been testing this "the hard way" by actually creating and running > the agents against actual LXC containers in a running cluster... If > there is a simple way of streamlining this testing I'd love to hear more > about it. (Did I mention that I'm not normally a "coder/developer"? - > Yes I know that's getting repetitive ;-) ) > > But, back on topic... I can confirm that this agent is working correctly > in a "live fire" environment. That's good to know. ocf-tester doesn't shoot blanks either (it operates on an actual incarnation of the resource), but it might run some tests that you manually do not, so it's always a wise idea to use it. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [ha-wg] Cluster Stack - Ubuntu Developer Summit
On 2011-04-26 19:33, Andres Rodriguez wrote: > UDS' are open-to-public events, and I believe it would be great if > upstream could participate and maybe even further the discussion about > the Cluster Stack. For more information about UDS, please visit [1]. The > specific date/time for the Cluster Stack session is not yet available. > > If you require any further information please don't hesitate to contact me. Andres already knows this, but FWIW I'll repost here that I'll be at UDS in time for the cluster stack session at 12 noon on 5/12. I'll stay in Budapest that evening and will probably join the Budapest sightseeing tour that the Hungarian Ubuntu team is organizing, so if anyone wants to link up with Andres and me for a few beverages please let us know. Andrew, interested in making a day trip to Budapest while you're still on this continent? Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 90, Issue 6
Darren, can you please subscribe to the list as a normal subscriber rather than to just the digest, so we can keep this discussion in one thread? On 2011-05-05 04:47, Darren Thompson wrote: > Florian/Team > > There was an error in the GIT-Hub version that was causing my re-base > attempts to fail, so I was forced to try to bring my "last known good" > version to the same configuration (mostly successful). > > I have since found the error in the GIT-Hub version (the initialisation > section was wrong, the meta-data error was a 'red herring') so have been > found and resolved so I have done an actual re-base now based on the > GIT-Hub version. > > Changes: > > 1. Corrected error in utilisation causing ocf to fail in HB_GUI. That is not an error; the Github version is correct. The path to the ocf-shellfuncs library was recently changed upstream; your installed version is apparently still using the old path. For the Github version to work on your system, you will have to apply the attached patch after you check out. Note that normally people would be building the whole resource-agents package from a git checkout and use _that_ on their test system, but you're not using git, so that option is out for you. Have I mentioned that starting to use git would be a good option? > 2. Added "information" to stop section, to provide more feedback on > container shutdown/stop (and to assist with future development of > "containers using alternate 'init' systems"). Applied and pushed to my lxc branch. Can you confirm that the current version is working for you and passes ocf-tester on your system? Cheers, Florian diff --git a/heartbeat/lxc b/heartbeat/lxc index 3b0df91..07e0026 100755 --- a/heartbeat/lxc +++ b/heartbeat/lxc @@ -34,8 +34,8 @@ # OCF_RESKEY_config # Initialization: -: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat} -. ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs +: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/resource.d/heartbeat} +. ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs # Set default TRANS_RES_STATE (temporary file to "flag" if resource was stated but not stopped) TRANS_RES_STATE="${HA_RSCTMP}/${OCF_RESOURCE_INSTANCE}.state" signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] ACLs and privilege escalation (was Re: New OCF RA: symlink)
Rather than going into ACLs in more detail, I wanted to highlight that however we limit access to the CIB, the resource agents still _execute_ as root, so we will always have what would normally be considered a privilege escalation issue. Now, we could agree on security guidelines for RAs, and some of those would certainly be no-brainers to define (such as, don't ever "eval" unsanitized user input), but I refuse to even suggest to tackle any such guidelines before the OCF spec update has gotten off the ground. One such thing that could be added to the spec would be optional meta variables named "user" and "group", directing the LRM (or any successor) to execute the RA as that user rather than root. Just an idea. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] New OCF RA: symlink
On 2011-04-20 14:37, Florian Haas wrote: > On 2011-04-20 11:41, Dominik Klein wrote: >> Hi >> >> I wrote a new RA that can manage a symlink. >> >> Configuration: >> >> primitive mylink ocf:heartbeat:symlink \ >> params link="/tmp/link" target="/tmp/target" \ >> op monitor interval="15" timeout="15" >> >> This will basically >> ln -s /tmp/target /tmp/link >> >> hth >> Dominik > > Dominik doesn't have a github repo yet, so I added this to a separate > branch in mine. The current revision is here: > > https://github.com/fghaas/resource-agents/blob/symlink/heartbeat/symlink > > Please comment freely. Thanks! I've just responded to two comments from Lars and Alan and I'd appreciate more. As of right now I don't see any show stoppers and I wouldn't like to hold up Dominik's contribution much longer. Unless I hear any valid objections, I intend to merge this new RA next Monday. Thanks! Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] New OCF RA: symlink
On 2011-04-22 14:25, Alan Robertson wrote: > Drbdlinks was never converted to an OCF RA, that I recall. It handles > cases of needing to restart the logging system when you changed symlnks > around - mainly for chroot services. I've used it for many years. You > can find the source for it here: > http://www.tummy.com/Community/software/drbdlinks/ > > It's pretty well thought out, and works quite well. I'd certainly look > it over before reinventing the wheel. AFAICS drbdlinks does some things on its own which in a Pacemaker cluster would be under Pacemaker control (restarting daemons, for example). The symlink RA does none of this, it's simple and effective and ties in quite well with Pacemaker management. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] New OCF RA: symlink
Coming back to this one, as the discussion seems to have died down. On 2011-04-20 19:00, Lars Ellenberg wrote: > Oh, well, thinking about non-roots that may have cibadmin karma, > they now can configure a resource that will remove /etc/passwd. > I'm not sure if I like that. > > How about a staged system? Double symlinks? > Similar to the alternatives system in Debian or others. > > The RA will force a single directory that will contain the indirection > symlinks, and will sanitize (or force) link names to not contain slashes. > > The real symlinks will point to that indirection symlink, which will > point to the end location. > > /etc/postfix/main.cf >-> /var/lib/wherever-indirection-dir/postfix_main.cf <<<=== > -> /mnt/somewhere/you/want/to/point/to/main.cf > > And <<<=== will be managed by the resource agent. Considering we have an "anything" resource agent which, well, lets us do anything, I consider this pointless cluttering of the resource agent which creates a false sense of security. Thoughts? Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 90, Issue 3
On 05/04/2011 10:52 AM, Darren Thompson wrote: > Florian/Team > > I have now updated my "re-based" ocf file to include the "experimental" > support for upstart and systemd using containers. > > I can confirm that this is still working correctly for containers > running 'sysv init' and "in theory" should now also work for containers > using 'upstart' and 'systemd'. > > I'm currently doing a "crash course' in installing containers to use > these 'init replacments' but have not yet succedded in testing either > 'upstart' or 'systemd' containers yet. > > If there is anyone with a better understanding of LXC containers and > one/both of these other 'init systems', please contact me as your > information/assistance would be invaluable. OK, updated my git branch. You really want to double check your "rebasing" method; you're constantly re-introducing things that I've removed or fixed in earlier commits. Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 90, Issue 2
On 05/04/2011 09:09 AM, Florian Haas wrote: > On 05/04/2011 08:44 AM, Darren Thompson wrote: >> Florian >> >> I have tried to re-base on your version but it just will not run for me. >> >> I keep getting "Failed to parse the metadata of LXC" syntax error line >> 1, column 1" >> >> I've no idea where this error is as it all looks fine... >> >> I'll attach my copy and a screen-shot of the error, HELP!!! > > It's probably not a wise approach to "test" with the GUI while you're > not even sure the resource agent will run. You will have to start the > script from the command line and see where your error is coming from. Btw, your patch contains this: diff --git a/heartbeat/lxc b/heartbeat/lxc index 25ef6e3..9819a47 100755 --- a/heartbeat/lxc +++ b/heartbeat/lxc @@ -123,7 +123,6 @@ LXC_start() { if ! LXC_monitor ; then touch $TRANS_RES_STATE ocf_log info "Starting" ${OCF_RESKEY_container} - cd "`dirname ${OCF_RESKEY_config`" ocf_run ${STARTCMD} || exit $OCF_ERR_GENERIC else # If already running, consider start successful Sorry, that was a typo on my part, should have been cd "`dirname ${OCF_RESKEY_config}`" (with closing brace) of course. Should I correct the typo, or can we drop that line altogether? Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 90, Issue 2
On 05/04/2011 08:44 AM, Darren Thompson wrote: > Florian > > I have tried to re-base on your version but it just will not run for me. > > I keep getting "Failed to parse the metadata of LXC" syntax error line > 1, column 1" > > I've no idea where this error is as it all looks fine... > > I'll attach my copy and a screen-shot of the error, HELP!!! It's probably not a wise approach to "test" with the GUI while you're not even sure the resource agent will run. You will have to start the script from the command line and see where your error is coming from. Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 89, Issue 32
On 2011-05-03 09:38, Darren Thompson wrote: > Florian/Team > > How do I see any latter commits to the GIT repository??? See my other message -- https://github.com/fghaas/resource-agents/commits/lxc/heartbeat/lxc > Is there a way I can confirm that you have committed my latest > version/changes? Yes, review the history. I've added a bunch of small commits to bring the resource agent in line with accepted precedent. Please test the current version and highlight any issues that may arise. You will need to tweak your configuration as I have renamed your parameters, and changed the semantics of some. One thing you might want to revisit is your use of the "${TRANS_RES_STATE}" state file. AIUI you're using this in order to tell a gracefully stopped container from one that has crashed -- are you sure LXC doesn't provide a built-in way to do that? Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 89, Issue 32
Hello Darren, Please get the current version from https://github.com/fghaas/resource-agents/blob/lxc/heartbeat/lxc, and also review the commit history at https://github.com/fghaas/resource-agents/commits/lxc/heartbeat/lxc. When you send more updates, please do make sure they track the latest version in my repo. I am doing my best splitting this up into patches as I can and check them in individually, but the re-introduction of errors that have already been fixed is not something that gives me thrills. Thanks. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 89, Issue 30
On 2011-04-29 08:04, Darren Thompson wrote: > You posted my "first attempt" and not the latest version, is it possible > to add that one as it addresses some( most hopefully) of the issues you > identified. Already there. Been there since yesterday. https://github.com/fghaas/resource-agents/commit/07827c42494dbec2c011133d9f82e831bc8b2eb6 > There are still some valid points you have raised however, So I'm going > to try to incorporate them into a "third version". See how much easier this would be if you actually did this in your own github repo that we could just pull from? Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers)
On 2011-04-28 10:21, Darren Thompson wrote: > Florina/TEAM > > Thanks for your input and the link to the guidelines > > I have updated my original ocf file in line with the guidlines, it even > gave me a few tips on how to do things "better" so was well worth the > time spent. > > Please find the updated ocf file for LXC contianers as a cluster > resource attached. > > Since I'm not an actual developer (or even a career coder) Do you think I am? > I do not have > the facility to host my own github fork so would appreciate "someone" > adopting this and integrating it into their git repository. OK, I have added this to a separate "lxc" branch in my own github fork. I'd appreciate if you could at least get yourself an account on github so you can comment on commit line notes. I have added my comments to this page: https://github.com/fghaas/resource-agents/commit/73f80b31f1cee5eff1c2fe2b968f4ea593e8f405 Some of those may have already been addresses in your updated version, but to keep things simple I've kept my comments to one commit for the time being. Florian PS: We can stop CC'ing the openais list, this is in no way Corosync/OpenAIS related. signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers)
On 2011-04-27 00:29, Darren Thompson wrote: > Florian > > All good points. > > Unfortunately I'm not a "programmer", so have no idea how to setup a > 'git repo' and currently have no facility to host it even if I knew how. That's the point of github; they provide all that infrastructure for you. It all boils down to - go to https://github.com and set up an account - go to https://github.com/ClusterLabs/resource-agents and click "Fork" - open a command line and do "git clone" of the URL that the web page then shows (likely g...@github.com:/resource-agents.git) - add your resource agent into the heartbeat/ directory - do "git add heartbeat/lxc" - do "git commit" with a meaningful commit message - do "git push" to push the changes to your github repo. > I will review the developers guide and as much as possible bring the OCF > in-line with those recommendations Yes, please do. > Why I did not use libvirt-manager LXC containers: > 1. Frankly I could not get the libvirt integration to work and wasted > weeks worth of testing trying, if someone more experienced would like to > get that working, more power to them. > 2. The libvirt works and acts like a competing fork and does not use any > of the "normal" lxc tools, I'm not sold that it's the best approach. OK, fair enough. If we get your resource agent into mergeable shape then the fact that may duplicate some VirtualDomain functionality is not a show stopper. One other question: have you considered submitting this resource agent to the lxc folks? Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers)
Thanks Darren! Thanks for the contribution! Can I suggest - we move this discussion to the linux-ha-dev list (where most OCF RA related discussions and reviews take place); - you give the RA a makeover following the OCF RA developer's guide (http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html); - you set up your own github fork off of https://github.com/ClusterLabs/resource-agents, and push your RA to that so we can eventually pull it into the mainline repo? Also, can you explain what the advantages of your approach are, versus using libvirt-managed lxc containers which Pacemaker can tie into via the existing VirtualDomain agent? Thanks! Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Crowd-sourceing request to review RAs for missing input sanitation/missing quotes/missing escapes etc. [Was: New OCF RA: symlink]
On 2011-04-22 10:57, Lars Ellenberg wrote: > On Thu, Apr 21, 2011 at 03:19:10PM +0200, Florian Haas wrote: >> On 2011-04-20 19:00, Lars Ellenberg wrote: >>> On Wed, Apr 20, 2011 at 06:49:48PM +0200, Lars Ellenberg wrote: >>> [a lot] >>> >>> I know I'm paranoid. >>> Am I too paranoid? >> >> Patches welcome. > > That phrase does work as reply to everything > you don't want to hear about ;-) > > Just because we probably have resource agents in tree > that don't do proper input sanitation, > and some of them may even do things like eval, > or forget to quote parameters that need to be quoted ... As for symlink, point taken. https://github.com/fghaas/resource-agents/commit/0fe17c1188e5228012427ccc17d7a79af40f8b31 I also rebased my branch against current upstream. Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] New OCF RA: symlink
On 2011-04-20 19:00, Lars Ellenberg wrote: > On Wed, Apr 20, 2011 at 06:49:48PM +0200, Lars Ellenberg wrote: > [a lot] > > I know I'm paranoid. > Am I too paranoid? Patches welcome. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] New OCF RA: symlink
On 2011-04-20 11:41, Dominik Klein wrote: > Hi > > I wrote a new RA that can manage a symlink. > > Configuration: > > primitive mylink ocf:heartbeat:symlink \ > params link="/tmp/link" target="/tmp/target" \ > op monitor interval="15" timeout="15" > > This will basically > ln -s /tmp/target /tmp/link > > hth > Dominik Dominik doesn't have a github repo yet, so I added this to a separate branch in mine. The current revision is here: https://github.com/fghaas/resource-agents/blob/symlink/heartbeat/symlink Please comment freely. Thanks! Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [GIT PULL] ldirectord
On 2011-04-19 02:09, Simon Horman wrote: > Hi Florian, > > please consider pulling > git://github.com/horms/resource-agents.git master > to get the following bug fix for ldirectord by Takeuchi-san. > > Sohgo Takeuchi (1): > fix a bug that IPv6 does not work fine. > > ldirectord/ldirectord.in | 16 +--- > 1 files changed, 13 insertions(+), 3 deletions(-) Applied with a slightly modified commit message. Please do not forget to rebase your tree on current upstream. Thanks! Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Linux-HA Wiki
On 2011-04-14 10:41, Ulf wrote: > Hi Florian, > > I've seen that you are editing some of the http://www.linux-ha.org/wiki/ > pages. > In http://linux-ha.org/doc/man-pages/re-ra-sfex.html is a link to > http://www.linux-ha.org/wiki/sfex_(resource_agent) , which doesn't exist. > > It would be very useful if you can migrate the very good documentation of > sfex from http://linux-ha.org/sfex to the wiki. It would be even more useful if *you* could do that, and also update the configuration snippet to Pacemaker crm shell syntax. :) I've created an account for you on the Linux-HA wiki, and you should have received a password by email. Thanks for picking this up! Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] mysql m/s rapid failover problem
On 2011-04-14 18:07, Raoul Bhatia [IPAX] wrote: > On 04/13/2011 01:18 PM, Florian Haas wrote: >> On 2011-04-13 12:54, Marek Marczykowski wrote: >>> On 04/13/11 09:17, Florian Haas wrote: >>>> Marek, have you considered setting up a personal fork of the >>>> ClusterLabs/resource-agents repo where you could keep track of your >>>> progress and which upstream could pull from? >>> >>> Good idea :) >>> >>> Pushed here: >>> https://github.com/marmarek/resource-agents >> >> Raoul, inclined to give Marek's current version of mysql a spin? > > first try: > https://github.com/marmarek/resource-agents/commit/ba7ab1d7012259be70c02cc26cbbc7313aa753d7#L0R857 See how easy that was? :) Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] mysql m/s rapid failover problem
On 2011-04-14 17:23, Raoul Bhatia [IPAX] wrote: > On 04/14/2011 04:15 PM, Raoul Bhatia [IPAX] wrote: >> On 04/13/2011 02:46 PM, Florian Haas wrote: >>>> 2. shouldn't line 251 be removed. it reads: >>>>> On M/S setup --skip-slave-start is needed (or in config file). >>>> but --skip-slave-start is enforced on line 791. >>> >>> The "line notes" feature on github is actually remarkably useful for >>> comments like this one. :) >> >> so you suggest i add a comment like this via github so a dev can fix >> this, correct? (i just started to actually *work* with github yesterday) > > mhm - i can only line-comment on my own fork, right? > Don't think so ... I've been able to comment on Evgeny's commits (nif) without problems. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] mysql m/s rapid failover problem
On 2011-04-13 13:56, Raoul Bhatia [IPAX] wrote: > hi, > > i'll take a look in the next days. > > as i'm running on debian squeeze with glue 1.0.6-1 and > cluster-agents 1:1.0.3-3.1, i will have to revert cs > 322b7fc587ea722a25e099f7a62cfafa01851394 where > OCF_FUNCTIONS_DIR changed. > > > i currently run [1] and things seem to be stable. > > > things i noticed: > > 1. the ra i'm running does not catch "connection refused" errors. > c/p from the replication error: > >> Last_IO_Errno: 2013 >> Last_IO_Error: error connecting to master >> 'mysql_rep@wdb01:3306' - retry-time: 60 retries: 86400 > > i run into this error because of a firewall misconfiguration. Well if the firewall just dropped the packets (as opposed to returning a RST TCP packet), then there wouldn't be any "connection refused" error to be expected. > 2. shouldn't line 251 be removed. it reads: >> On M/S setup --skip-slave-start is needed (or in config file). > but --skip-slave-start is enforced on line 791. The "line notes" feature on github is actually remarkably useful for comments like this one. :) > 3. is merging possible? i thought that this mysql ra will > always need more than one node to function properly? It shouldn't and if it does, then that's a bug. The agent can always test whether it's running as a master/slave set with ocf_is_ms. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] mysql m/s rapid failover problem
On 2011-04-13 12:54, Marek Marczykowski wrote: > On 04/13/11 09:17, Florian Haas wrote: >> Marek, have you considered setting up a personal fork of the >> ClusterLabs/resource-agents repo where you could keep track of your >> progress and which upstream could pull from? > > Good idea :) > > Pushed here: > https://github.com/marmarek/resource-agents Raoul, inclined to give Marek's current version of mysql a spin? Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/