Re: [Linux-ha-dev] Announcing crmsh release 2.1.7
Thank you. Comprehensively answered. On 1 Sep 2016 6:27 PM, "Kristoffer Grönlund" <kgronl...@suse.com> wrote: > Darren Thompson <darr...@akurit.com.au> writes: > > > Just a quick question: > > > > If "scripts: no-quorum-policy=ignore" is becoming depreciated, how are we > > to manage two node (e.g. test) clusters that require this work around > since > > quorum state on a single node is an odd state. > > > > Hi Darren, > > There are better mechanisms in corosync and Pacemaker for handling two > node clusters now while still maintaining quorum. > > In corosync 2, we have the two_node: 1 setting for votequorum, which > ensures that a two node cluster doesn't suffer split brain (fencing is > required for this to work properly). > > There is an explanation for how this works here: > > http://people.redhat.com/ccaulfie/docs/Votequorum_Intro.pdf > > Somewhat related, there used to be the start-delay meta parameter which > could be set for example for sbd stonith resources, to make a > double-fencing scenario less likely. This has now been replaced by the > pcmk_delay_max parameter. For an example of how to use this, see this > pull request for sbd: > > https://github.com/ClusterLabs/sbd/pull/15/commits/ > ca2fba836eab169f0c8cacf7f3757c0485bcfef8 > > Cheers, > Kristoffer > > -- > // Kristoffer Grönlund > // kgronl...@suse.com > ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Announcing crmsh release 2.1.7
Team good work on the new version, appreciated. Just a quick question: If "scripts: no-quorum-policy=ignore" is becoming depreciated, how are we to manage two node (e.g. test) clusters that require this work around since quorum state on a single node is an odd state. Regards Darren Thompson Professional Services Engineer / Consultant Level 3, 60 City Road Southgate, VIC 3006 Mb: 0400 640 414 Mail: darr...@akurit.com.au <st...@akurit.com.au> Web: www.akurit.com.au On 1 September 2016 at 17:01, Kristoffer Grönlund <kgronl...@suse.com> wrote: > Hello everyone! > > Today I are proud to announce the release of `crmsh` version 2.1.7! > The major new thing in this release is a backports of the events-based > alerts support from the 2.3 branch. > > Big thanks to Hideo Yamauchi for his patience and testing of the > alerts backport. > > This time, the list of changes is small enough that I can add it right > here: > > - high: parse: Backport of event-driven alerts parser (#150) > - high: hb_report: Don't collect logs from journalctl if -M is set > (bsc#990025) > - high: hb_report: Skip lines without timestamps in log correctly > (bsc#989810) > - high: constants: Add maintenance to set of known attributes (bsc#981659) > - high: utils: Avoid deadlock if DC changes during idle wait (bsc#978480) > - medium: scripts: no-quorum-policy=ignore is deprecated (bsc#981056) > - low: cibconfig: Don't mix up CLI name with XML tag > > You can also get the list of changes from the changelog: > > * https://github.com/ClusterLabs/crmsh/blob/2.1.7/ChangeLog > > Right now, I don't have a set of pre-built rpm packages for Linux > distributions ready, but I am going to make this available soon. This > is in particular for centOS 6.x which still relies on Python 2.6 > support which makes running the later releases there more > difficult. These packages will most likely appear as a subrepository > here (more details coming soon): > > * http://download.opensuse.org/repositories/network:/ha- > clustering:/Stable/ > > Archives of the tagged release: > > * https://github.com/ClusterLabs/crmsh/archive/2.1.7.tar.gz > * https://github.com/ClusterLabs/crmsh/archive/2.1.7.zip > > > Thank you, > > Kristoffer > > -- > // Kristoffer Grönlund > // kgronl...@suse.com > ___ > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ > ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] R: R: [PATCH] Filesystem RA:
Hi G. I personally recommend as a minimum that you setup a SBD partition and use SBD STONITH. It protects against file/ database corruption in the event of an issue on the underlying storage. Hardware (power) STONITH is considered the best protection, but I have had clusters running for years using just SBD STONITH and I would not deploy a cluster managed file system without it, You should also strongly consider setting the fence on stop failure for the same reason. The worst possible corruption can be caused by the cluster having a split brain due to a partially dismounted file system and another node mounting and writing to it at the same time. Regards D. On 10/04/2013, at 5:30 PM, Guglielmo Abbruzzese g.abbruzz...@resi.it wrote: Hi Darren, I am aware STONITH could help, but unfortunately I cannot add such device to the architecture at the moment. Furthermore, sybase seems to be stopped (the start/stop order should be already granted by the Resource Group structure) Resource Group: grp-sdg resource_vrt_ip(ocf::heartbeat:IPaddr2): Started NODE_A resource_lvm (ocf::heartbeat:LVM): Started NODE_A resource_lvmdir(ocf::heartbeat:Filesystem):failed (and so unmanaged) resource_sybase(lsb:sybase): stopped resource_httpd (lsb:httpd):stopped resource_tomcatd (lsb:tomcatd): stopped resource_sdgd (lsb:sdgd): stopped resource_statd (lsb:statistiched): stopped I'm just guessing, why the same configuration swapped fine with the previous storage? The only difference could be the changed multipath configuration Thanks a lot G. -Messaggio originale- Da: linux-ha-dev-boun...@lists.linux-ha.org [mailto:linux-ha-dev-boun...@lists.linux-ha.org] Per conto di Darren Thompson (AkurIT) Inviato: martedì 9 aprile 2013 23:35 A: High-Availability Linux Development List Oggetto: Re: [Linux-ha-dev] R: [PATCH] Filesystem RA: Hi The correct way for that to have been handled, given you additional detail would have been for the node to have received a STONITH. Things that you should check: 1 STONITH device configured correctly and operational. 2 the on fail for any file system cluster resource stop should be fence. 3 you need to review your constraints to ensure that the order and relationship between SYBASE and file system resource needs to be corrected so that SYBASE is stopped first. Hope this helps Darren Sent from my iPhone On 09/04/2013, at 11:57 PM, Guglielmo Abbruzzese g.abbruzz...@resi.it wrote: Hi everybody, In my case (very similar to Junko's) when I disconnect the Fibre Channels the try_umount procedure in RA Filesystem script doesn't work. After the programmed attempts the active/passive cluster doesn't swap, and the lvmdir resource is flagged as failed rather than stopped. I must say, even if I try to umount the /storage resource manually it doesn't work because of sybase is using some files stored on it (busy); this is why the RA cannot complete the operation in a clean mode. Is there a way to force the swap anyway? Some issues. I already tried: 1) This very test with a different optical SAN/storage in the past, and the RA could always umount correctly the storage; 2) I modified the RA forcing the option umount -l even in case I've got a ext4 FR rather than NFS; 3) I killed the hanged processes with the command fuser -km /storage but the umount option always failed, and after a while I obtained a kernel panic Is there a way to force the swap anyway, even if the umount is not clean? Any suggestion? Thanks for your time, Regards Guglielmo P.S. lvmdir resource configuration primitive class=ocf id=resource_lvmdir provider=heartbeat type=Filesystem instance_attributes id=resource_lvmdir-instance_attributes nvpair id=resource_lvmdir-instance_attributes-device name=device value=/dev/VG_SDG_Cluster_RM/LV_SDG_Cluster_RM/ nvpair id=resource_lvmdir-instance_attributes-directory name=directory value=/storage/ nvpair id=resource_lvmdir-instance_attributes-fstype name=fstype value=ext4/ /instance_attributes meta_attributes id=resource_lvmdir-meta_attributes nvpair id=resource_lvmdir-meta_attributes-multiple-active name=multiple-active value=stop_start/ nvpair id=resource_lvmdir-meta_attributes-migration-threshold name=migration-threshold value=1/ nvpair id=resource_lvmdir-meta_attributes-failure-timeout name=failure-timeout value=0/ /meta_attributes operations op enabled=true id=resource_lvmdir-startup interval=60s name=monitor on-fail=restart requires=nothing timeout=40s/ op id=resource_lvmdir-start-0 interval=0 name=start on-fail=restart requires=nothing timeout=180s/ op id=resource_lvmdir-stop-0 interval=0 name=stop on-fail=restart requires
Re: [Linux-ha-dev] R: [PATCH] Filesystem RA:
Hi The correct way for that to have been handled, given you additional detail would have been for the node to have received a STONITH. Things that you should check: 1 STONITH device configured correctly and operational. 2 the on fail for any file system cluster resource stop should be fence. 3 you need to review your constraints to ensure that the order and relationship between SYBASE and file system resource needs to be corrected so that SYBASE is stopped first. Hope this helps Darren Sent from my iPhone On 09/04/2013, at 11:57 PM, Guglielmo Abbruzzese g.abbruzz...@resi.it wrote: Hi everybody, In my case (very similar to Junko's) when I disconnect the Fibre Channels the try_umount procedure in RA Filesystem script doesn't work. After the programmed attempts the active/passive cluster doesn't swap, and the lvmdir resource is flagged as failed rather than stopped. I must say, even if I try to umount the /storage resource manually it doesn't work because of sybase is using some files stored on it (busy); this is why the RA cannot complete the operation in a clean mode. Is there a way to force the swap anyway? Some issues. I already tried: 1) This very test with a different optical SAN/storage in the past, and the RA could always umount correctly the storage; 2) I modified the RA forcing the option umount -l even in case I've got a ext4 FR rather than NFS; 3) I killed the hanged processes with the command fuser -km /storage but the umount option always failed, and after a while I obtained a kernel panic Is there a way to force the swap anyway, even if the umount is not clean? Any suggestion? Thanks for your time, Regards Guglielmo P.S. lvmdir resource configuration primitive class=ocf id=resource_lvmdir provider=heartbeat type=Filesystem instance_attributes id=resource_lvmdir-instance_attributes nvpair id=resource_lvmdir-instance_attributes-device name=device value=/dev/VG_SDG_Cluster_RM/LV_SDG_Cluster_RM/ nvpair id=resource_lvmdir-instance_attributes-directory name=directory value=/storage/ nvpair id=resource_lvmdir-instance_attributes-fstype name=fstype value=ext4/ /instance_attributes meta_attributes id=resource_lvmdir-meta_attributes nvpair id=resource_lvmdir-meta_attributes-multiple-active name=multiple-active value=stop_start/ nvpair id=resource_lvmdir-meta_attributes-migration-threshold name=migration-threshold value=1/ nvpair id=resource_lvmdir-meta_attributes-failure-timeout name=failure-timeout value=0/ /meta_attributes operations op enabled=true id=resource_lvmdir-startup interval=60s name=monitor on-fail=restart requires=nothing timeout=40s/ op id=resource_lvmdir-start-0 interval=0 name=start on-fail=restart requires=nothing timeout=180s/ op id=resource_lvmdir-stop-0 interval=0 name=stop on-fail=restart requires=nothing timeout=180s/ /operations /primitive 2012/5/9 Junko IKEDA tsukishima...@gmail.com: Hi, In my case, the umount succeed when the Fibre Channels is disconnected, so it seemed that the handling status file caused a longer failover, as Dejan said. If the umount fails, it will go into a timeout, might call stonith action, and this case also makes sense (though I couldn't see this). I tried the following setup; (1) timeout : multipath RA multipath timeout = 120s Filesystem RA stop timeout = 60s (2) timeout : multipath RA multipath timeout = 60s Filesystem RA stop timeout = 120s case (1), Filesystem_stop() fails. The hanging FC causes the stop timeout. case (2), Filesystem_stop() succeeds. Filesystem is hanging out, but line 758 and 759 succeed(rc=0). The status file is no more inaccessible, so it remains on the filesystem, in fact. 758 if [ -f $STATUSFILE ]; then 759 rm -f ${STATUSFILE} 760 if [ $? -ne 0 ]; then so, the line 761 might not be called as expected. 761 ocf_log warn Failed to remove status file ${STATUSFILE}. By the way, my concern is the unexpected stop timeout and the longer fail over time, if OCF_CHECK_LEVEL is set as 20, it would be better to try remove its status file just in case. It can handle the case (2) if the user wants to recover this case with STONITH. Thanks, Junko 2012/5/8 Dejan Muhamedagic de...@suse.de: Hi Lars, On Tue, May 08, 2012 at 01:35:16PM +0200, Lars Marowsky-Bree wrote: On 2012-05-08T12:08:27, Dejan Muhamedagic de...@suse.de wrote: In the default (without OCF_CHECK_LEVE), it's enough to try unmount the file system, isn't it? https://github.com/ClusterLabs/resource-agents/blob/master/heart beat/Filesystem#L774 I don't see a need to remove the STATUSFILE at all, as that may (and as you observed it) prevent the filesystem from stopping. Perhaps to skip it altogether? If nobody objects let's just remove this code: 758 if [ -f
Re: [Linux-ha-dev] lxc RA merged
Florian/Team Please find an updated version for the 'lxc resource agent' As Florian pointed out, I had not properly initialised/set the new use_screen parameter. The attached file includes that correction. PS (OK I'm a complete newbie and I know this should be obvious but) I had attempted to update the original in GITHUB (by forking) but am now unable to edit it to add this missing attribute, How do I re-edit my change to include a second change Darren On Mon, 2011-06-06 at 12:36 +0930, Darren Thompson wrote: Florian I have done some live fire testing of the updated lxc resource. I noted that screens has been depreciated in favour of running the lxc as a daemon, with output going to a new log file Unfortunately when i used it in this configuration I cannot connect to the running container and all the log output shows is more processes left in this runlevel for 5 minutesstty: standard input: Inappropriate ioctl for device Master Resource Control: previous runlevel: N, switching to runlevel: 3 tcgetattr: Inappropriate ioctl for device Master Resource Control: runlevel 3 has been reached stty: standard input: Inappropriate ioctl for device. The cluster show the container as running, but I cannot ping the IP address that the container should be using so cannot confirm that it is running correctly. I suspect that the container is having trouble running as there is not a root console device when run as a daemon. Without the root console available via screens it's very very difficult to diagnose the issue with the container to be certain as to what is casing the problem. I may create a modified version with the screens re-added as an option, as that is my personal preference and will also help diagnose the error I'm currently getting with the lxc resource. I'll sen out the updated version as an attachment to this list (I still have no idea how to create patches/submissions on GIT hub). I'm also now getting errors on the original links to your fork on GitHub, I'm assuming it's because the driver it's now been pulled into the core (or something) making your fork redundant. Darren On Mon, 2011-05-30 at 15:45 +0200, Florian Haas wrote: Hello, after much useful testing from Christoph Mitasch and a number of necessary changes highlighted by ocf-tester, I've now merged and pushed the lxc resource agent that was originally contributed by Darren Thompson. The resource agent is here: https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/lxc Its commit history up to this point can be reviewed here: https://github.com/ClusterLabs/resource-agents/commits/master/heartbeat/lxc Hope this is useful. Cheers, Florian ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ lxc Description: application/shellscript ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] lxc RA merged
Florian I have done some live fire testing of the updated lxc resource. I noted that screens has been depreciated in favour of running the lxc as a daemon, with output going to a new log file Unfortunately when i used it in this configuration I cannot connect to the running container and all the log output shows is more processes left in this runlevel for 5 minutesstty: standard input: Inappropriate ioctl for device Master Resource Control: previous runlevel: N, switching to runlevel: 3 tcgetattr: Inappropriate ioctl for device Master Resource Control: runlevel 3 has been reached stty: standard input: Inappropriate ioctl for device. The cluster show the container as running, but I cannot ping the IP address that the container should be using so cannot confirm that it is running correctly. I suspect that the container is having trouble running as there is not a root console device when run as a daemon. Without the root console available via screens it's very very difficult to diagnose the issue with the container to be certain as to what is casing the problem. I may create a modified version with the screens re-added as an option, as that is my personal preference and will also help diagnose the error I'm currently getting with the lxc resource. I'll sen out the updated version as an attachment to this list (I still have no idea how to create patches/submissions on GIT hub). I'm also now getting errors on the original links to your fork on GitHub, I'm assuming it's because the driver it's now been pulled into the core (or something) making your fork redundant. Darren On Mon, 2011-05-30 at 15:45 +0200, Florian Haas wrote: Hello, after much useful testing from Christoph Mitasch and a number of necessary changes highlighted by ocf-tester, I've now merged and pushed the lxc resource agent that was originally contributed by Darren Thompson. The resource agent is here: https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/lxc Its commit history up to this point can be reviewed here: https://github.com/ClusterLabs/resource-agents/commits/master/heartbeat/lxc Hope this is useful. Cheers, Florian ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] lxc RA merged
Florian Please find attached an updated version of the 'lxc resource agent'. It is based on the up-line lxc resource agent so these changes should be relatively easily merged (one day I work out how to create patches). I have re-added screen support as an option (off by default). My reasoning for doing that is: 1. Quite frankly, I cannot get the containers to run correctly in my test environment without the 'root console' being redirected to a screen. 2. I personally like to see what the containers 'root console' is doing. 3. the option is off by default so should not upset any-ones sensibilities (I really don't understand what people have against using screen in this case) Darren On Mon, 2011-05-30 at 15:45 +0200, Florian Haas wrote: Hello, after much useful testing from Christoph Mitasch and a number of necessary changes highlighted by ocf-tester, I've now merged and pushed the lxc resource agent that was originally contributed by Darren Thompson. The resource agent is here: https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/lxc Its commit history up to this point can be reviewed here: https://github.com/ClusterLabs/resource-agents/commits/master/heartbeat/lxc Hope this is useful. Cheers, Florian ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ lxc Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] lxc RA merged
Florian I'll fire this up in my test lab and see if these changes break anything under Live fire. Definitely tidier code. I'll update this list and post you anything that I feel needs changing. Darren On Mon, 2011-05-30 at 15:45 +0200, Florian Haas wrote: Hello, after much useful testing from Christoph Mitasch and a number of necessary changes highlighted by ocf-tester, I've now merged and pushed the lxc resource agent that was originally contributed by Darren Thompson. The resource agent is here: https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/lxc Its commit history up to this point can be reviewed here: https://github.com/ClusterLabs/resource-agents/commits/master/heartbeat/lxc Hope this is useful. Cheers, Florian ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 90, Issue 8
Florian Could you send me an actual file so I can use that as a template. I still do not have my head around what the actual requirements are. Darren On Wed, 2011-05-11 at 16:49 +0200, Florian Haas wrote: Darren, On 2011-05-05 15:07, Florian Haas wrote: On 2011-05-05 14:26, Darren Thompson wrote: Can you confirm that the current version is working for you and passes ocf-tester on your system? What is an ocf-tester??? http://www.linux-ha.org/doc/dev-guides/_testing_installing_and_packaging_resource_agents.html I have been testing this the hard way by actually creating and running the agents against actual LXC containers in a running cluster... If there is a simple way of streamlining this testing I'd love to hear more about it. (Did I mention that I'm not normally a coder/developer? - Yes I know that's getting repetitive ;-) ) But, back on topic... I can confirm that this agent is working correctly in a live fire environment. That's good to know. ocf-tester doesn't shoot blanks either (it operates on an actual incarnation of the resource), but it might run some tests that you manually do not, so it's always a wise idea to use it. Any news regarding running ocf-tester on your lxc agent? Cheers, Florian ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Filesystem ocf file
Florian Ok then... I agree it does seem to be poorly designed and It's far from intuitive... But If it's actually correct who am I to argue... Darren On Fri, 2011-05-06 at 09:37 +0200, Florian Haas wrote: On 2011-05-06 09:26, Darren Thompson wrote: Team I was reviewing some errors on a cluster mounted file-system that caused me to review the Filesystem ocf file. I notice that it uses an undeclared parameter of OCF_CHECK_LEVEL to determine what degree of testing of the filesystem is required in monitor I have now updated it to more formally work with a check_level value with the more obvious values of mounted, read write ( my updated version attached ) Could someone (Florian is this something you can do?) please review this with a view to patching the upstream Filesystem ocf file. NACK, sorry. The OCF_CHECK_LEVEL is specific to the monitor action and described as such in the OCF spec; this will not be changed without a change to the spec. To use it, set op monitor interval=X OCF_CHECK_LEVEL=Y Yes, it's poorly designed, it makes no sense why this is pretty much the only sensible time to set a parameter specifically for an operation (as opposed to on a resource), it's inexplicable why it's all caps, etc., but that's the way it is. Cheers, Florian ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 90, Issue 8
Florian/Team Comments in-line... On Thu, 2011-05-05 at 05:47 -0600, linux-ha-dev-requ...@lists.linux-ha.org wrote: Darren, can you please subscribe to the list as a normal subscriber rather than to just the digest, so we can keep this discussion in one thread? Ok, done... The digest mode was a good idea at the time... On 2011-05-05 04:47, Darren Thompson wrote: Florian/Team There was an error in the GIT-Hub version that was causing my re-base attempts to fail, so I was forced to try to bring my last known good version to the same configuration (mostly successful). I have since found the error in the GIT-Hub version (the initialisation section was wrong, the meta-data error was a 'red herring') so have been found and resolved so I have done an actual re-base now based on the GIT-Hub version. Changes: 1. Corrected error in utilisation causing ocf to fail in HB_GUI. That is not an error; the Github version is correct. The path to the ocf-shellfuncs library was recently changed upstream; your installed version is apparently still using the old path. For the Github version to work on your system, you will have to apply the attached patch after you check out. If I had any idea how to use GIT and apply patches, this whole conversation would never be happening ;-) Did I mention that I'm not normally a coder/developer? Note that normally people would be building the whole resource-agents package from a git checkout and use _that_ on their test system, but you're not using git, so that option is out for you. Have I mentioned that starting to use git would be a good option? Did I ever claim to be normal... Mind you, if I told my partner... normally people would be building the whole resource-agents package from a git checkout and use _that_ on their test system... she would laugh her head off... I suppose it depends on your definition of normality... ;-) Using git would probably be a good option, but my requirement is for this to work under SLES11SP1 with the HA option pack, none of which is consistent with building the whole resource-agent package, although I do intend to raise a SR with Attachmate/Novell/SuSE for them to come in-line with these standards; as to support their environment on clients sites would be a PITA if they vary too much from the agreed standards. 2. Added information to stop section, to provide more feedback on container shutdown/stop (and to assist with future development of containers using alternate 'init' systems). Applied and pushed to my lxc branch. Thank you. Yes I can see that in the on-line version. Can you confirm that the current version is working for you and passes ocf-tester on your system? What is an ocf-tester??? I have been testing this the hard way by actually creating and running the agents against actual LXC containers in a running cluster... If there is a simple way of streamlining this testing I'd love to hear more about it. (Did I mention that I'm not normally a coder/developer? - Yes I know that's getting repetitive ;-) ) But, back on topic... I can confirm that this agent is working correctly in a live fire environment. Darren attachment: face-wink.png___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 90, Issue 2
Florian I have tried to re-base on your version but it just will not run for me. I keep getting Failed to parse the metadata of LXC syntax error line 1, column 1 I've no idea where this error is as it all looks fine... I'll attach my copy and a screen-shot of the error, HELP!!! Darren On Tue, 2011-05-03 at 07:59 -0600, linux-ha-dev-requ...@lists.linux-ha.org wrote: Date: Tue, 03 May 2011 08:20:56 +0200 From: Florian Haas florian.h...@linbit.com Subject: Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 89, Issue 32 To: High-Availability Linux Development List linux-ha-dev@lists.linux-ha.org Message-ID: 4dbf9ec8.4060...@linbit.com Content-Type: text/plain; charset=utf-8 Hello Darren, Please get the current version from https://github.com/fghaas/resource-agents/blob/lxc/heartbeat/lxc, and also review the commit history at https://github.com/fghaas/resource-agents/commits/lxc/heartbeat/lxc. When you send more updates, please do make sure they track the latest version in my repo. I am doing my best splitting this up into patches as I can and check them in individually, but the re-introduction of errors that have already been fixed is not something that gives me thrills. Thanks. Cheers, Florian attachment: Screenshot-Message.png lxc Description: application/shellscript ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Linux-HA-Dev Digest, Vol 90, Issue 4
Florian/Team I have succeeded in re-basing my work on the version in the repository. That should make re-integrating my changes much more straight forward.. It is much more succinct now. Changes in this version: 1. Re-based on Florian's version in https://github.com/fghaas/resource-agents/blob/lxc/heartbeat/lxc; 2. Removed root variable requirements (it was only used once and did not add significant value to configuration) 3. Removed mention of root from verify and stop sections 3. Cleaned up and expanded meta-data section, added to descriptions etc 4. remove superfluous cd in start section as no longer requires with full config path specified. I really do not know what happend to my first few attempts at re-basing, I'm assuming an invalid character got into the file somewhere... Darren On Wed, 2011-05-04 at 00:44 -0600, linux-ha-dev-requ...@lists.linux-ha.org wrote: Florian I have tried to re-base on your version but it just will not run for me. I keep getting Failed to parse the metadata of LXC syntax error line 1, column 1 I've no idea where this error is as it all looks fine... I'll attach my copy and a screen-shot of the error, HELP!!! Darren lxc Description: application/shellscript ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 90, Issue 6
Florian/Team There was an error in the GIT-Hub version that was causing my re-base attempts to fail, so I was forced to try to bring my last known good version to the same configuration (mostly successful). I have since found the error in the GIT-Hub version (the initialisation section was wrong, the meta-data error was a 'red herring') so have been found and resolved so I have done an actual re-base now based on the GIT-Hub version. Changes: 1. Corrected error in utilisation causing ocf to fail in HB_GUI. 2. Added information to stop section, to provide more feedback on container shutdown/stop (and to assist with future development of containers using alternate 'init' systems). Regards Darren On Wed, 2011-05-04 at 12:00 -0600, linux-ha-dev-requ...@lists.linux-ha.org wrote: Florian/Team I have now updated my re-based ocf file to include the experimental support for upstart and systemd using containers. I can confirm that this is still working correctly for containers running 'sysv init' and in theory should now also work for containers using 'upstart' and 'systemd'. I'm currently doing a crash course' in installing containers to use these 'init replacments' but have not yet succedded in testing either 'upstart' or 'systemd' containers yet. If there is anyone with a better understanding of LXC containers and one/both of these other 'init systems', please contact me as your information/assistance would be invaluable. OK, updated my git branch. You really want to double check your rebasing method; you're constantly re-introducing things that I've removed or fixed in earlier commits. Florian lxc Description: application/shellscript ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 89, Issue 33
Florian/Team Another update to the lxc (Linux container) ocf file. (attached) Changes (summary): Added very very very experimental support for alternate init systems inside containers (it should now support sysvinit, upstart and systemd). Adding this support did not break the default sysvinit, but since I do not know how to create a LXC container that uses 'upstart' or 'systemd' my testing is very rudimentary for those two systems. I have made no progress whatsoever with removing the requirement for screen as I still have not found a working alternative to provide the root console created by lxc-start (it takes over the default console it's run on and that is lost when run as a cluster service, if screen is not used.) At this point I may have to confess that getting this working without using screen may be beyond my abilities (for now, I'm stubborn so will keep plugging away at this, but don't hold your breath). I'm still not sure why the use of screen is so repellent to some, as it works well and is quite innocuous generally. Regards Darren On Sat, 2011-04-30 at 12:00 -0600, linux-ha-dev-requ...@lists.linux-ha.org wrote: Date: Sat, 30 Apr 2011 16:10:52 +0930 From: Darren Thompson darr...@akurit.com.au Subject: Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 89, Issue 32 To: linux-ha-dev@lists.linux-ha.org Message-ID: 1304145652.5625.50.ca...@darrenspc.akurit.com.au Content-Type: text/plain; charset=utf-8 Florin/TEAM Please find the latest instalment of the LXC containers ocf. Changes (summary): Moved cgroup_mounted out of default initialisation and made it a function (used by start/stop). Also cleaned up some other code sections, including expanding on verify_all section to more fully test configuration. Also merged validate and status sections. My next work will be determining the best way to make the containers init type independent (due to the rise of init replacements like systemd and upstart) and also investigating the removal of the screen tool from the startup as it's received negative feed back from a few sources Darren lxc Description: application/shellscript ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 90, Issue 2
Florian/Team Sorry I did not read this sooner, my last update will still have be messy for you (sorry). I'll grab a copy of the current version and re-base my work on that. I see that you have streamlined it quite a bit, I'll test it in my environment to ensure it's working as expected (I note that the parameters have changed names and some functionality so will re-create my cluster/lxc/containers using this and re-test). Darren On Tue, 2011-05-03 at 07:59 -0600, linux-ha-dev-requ...@lists.linux-ha.org wrote: Hello Darren, Please get the current version from https://github.com/fghaas/resource-agents/blob/lxc/heartbeat/lxc, and also review the commit history at https://github.com/fghaas/resource-agents/commits/lxc/heartbeat/lxc. When you send more updates, please do make sure they track the latest version in my repo. I am doing my best splitting this up into patches as I can and check them in individually, but the re-introduction of errors that have already been fixed is not something that gives me thrills. Thanks. Cheers, Florian ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 89, Issue 32
Florin/TEAM Please find the latest instalment of the LXC containers ocf. Changes (summary): Moved cgroup_mounted out of default initialisation and made it a function (used by start/stop). Also cleaned up some other code sections, including expanding on verify_all section to more fully test configuration. Also merged validate and status sections. My next work will be determining the best way to make the containers init type independent (due to the rise of init replacements like systemd and upstart) and also investigating the removal of the screen tool from the startup as it's received negative feed back from a few sources Darren On Fri, 2011-04-29 at 12:00 -0600, linux-ha-dev-requ...@lists.linux-ha.org wrote: Date: Fri, 29 Apr 2011 09:57:04 +0200 From: Florian Haas florian.h...@linbit.com Subject: Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 89, Issue 30 To: High-Availability Linux Development List linux-ha-dev@lists.linux-ha.org Message-ID: 4dba6f50.7090...@linbit.com Content-Type: text/plain; charset=utf-8 On 2011-04-29 08:04, Darren Thompson wrote: You posted my first attempt and not the latest version, is it possible to add that one as it addresses some( most hopefully) of the issues you identified. Already there. Been there since yesterday. https://github.com/fghaas/resource-agents/commit/07827c42494dbec2c011133d9f82e831bc8b2eb6 There are still some valid points you have raised however, So I'm going to try to incorporate them into a third version. See how much easier this would be if you actually did this in your own github repo that we could just pull from? Cheers, Florian lxc Description: application/shellscript ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 89, Issue 30
Florian/TEAM Thank you for the update. I'll thread my remaining replies into the message :-) On Thu, 2011-04-28 at 07:46 -0600, linux-ha-dev-requ...@lists.linux-ha.org wrote: Date: Thu, 28 Apr 2011 11:56:30 +0200 From: Florian Haas florian.h...@linbit.com Subject: Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) To: High-Availability Linux Development List linux-ha-dev@lists.linux-ha.org Message-ID: 4db939ce.5000...@linbit.com Content-Type: text/plain; charset=utf-8 On 2011-04-28 10:21, Darren Thompson wrote: Florina/TEAM Thanks for your input and the link to the guidelines I have updated my original ocf file in line with the guidlines, it even gave me a few tips on how to do things better so was well worth the time spent. Please find the updated ocf file for LXC contianers as a cluster resource attached. Since I'm not an actual developer (or even a career coder) Do you think I am? Until Today, i have had no experience whatsoever with github, so compared to me... yes... I do not have the facility to host my own github fork so would appreciate someone adopting this and integrating it into their git repository. OK, I have added this to a separate lxc branch in my own github fork. I'd appreciate if you could at least get yourself an account on github so you can comment on commit line notes. I have added my comments to this page: https://github.com/fghaas/resource-agents/commit/73f80b31f1cee5eff1c2fe2b968f4ea593e8f405 Yep, done.. I responded to nearly all of the you points (most of the time to say, yep... agree). You posted my first attempt and not the latest version, is it possible to add that one as it addresses some( most hopefully) of the issues you identified. There are still some valid points you have raised however, So I'm going to try to incorporate them into a third version. Is there some clever way of re-integrating all of this? (did I mention that I'm not normally a coder). Some of those may have already been addresses in your updated version, but to keep things simple I've kept my comments to one commit for the time being. Florian PS: We can stop CC'ing the openais list, this is in no way Corosync/OpenAIS related. Agreed, I will stop pestering that list now :-) Darren attachment: face-smile.png___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers)
Florina/TEAM Thanks for your input and the link to the guidelines I have updated my original ocf file in line with the guidlines, it even gave me a few tips on how to do things better so was well worth the time spent. Please find the updated ocf file for LXC contianers as a cluster resource attached. Since I'm not an actual developer (or even a career coder) I do not have the facility to host my own github fork so would appreciate someone adopting this and integrating it into their git repository. I have since added myself the the developer mailing list so I should be able to contribute to the refining of this. Regards Darren On Tue, 2011-04-26 at 15:36 +0200, Florian Haas wrote: Thanks Darren! Thanks for the contribution! Can I suggest - we move this discussion to the linux-ha-dev list (where most OCF RA related discussions and reviews take place); - you give the RA a makeover following the OCF RA developer's guide (http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html); - you set up your own github fork off of https://github.com/ClusterLabs/resource-agents, and push your RA to that so we can eventually pull it into the mainline repo? Also, can you explain what the advantages of your approach are, versus using libvirt-managed lxc containers which Pacemaker can tie into via the existing VirtualDomain agent? Thanks! Cheers, Florian lxc Description: application/shellscript ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/