Re: [Linux-ha-dev] Announcing crmsh release 2.1.7

2016-09-01 Thread Darren Thompson
Thank you.

Comprehensively  answered.

On 1 Sep 2016 6:27 PM, "Kristoffer Grönlund" <kgronl...@suse.com> wrote:

> Darren Thompson <darr...@akurit.com.au> writes:
>
> > Just a quick question:
> >
> > If "scripts: no-quorum-policy=ignore" is becoming depreciated, how are we
> > to manage two node (e.g. test) clusters that require this work around
> since
> > quorum state on a single node is an odd state.
> >
>
> Hi Darren,
>
> There are better mechanisms in corosync and Pacemaker for handling two
> node clusters now while still maintaining quorum.
>
> In corosync 2, we have the two_node: 1 setting for votequorum, which
> ensures that a two node cluster doesn't suffer split brain (fencing is
> required for this to work properly).
>
> There is an explanation for how this works here:
>
> http://people.redhat.com/ccaulfie/docs/Votequorum_Intro.pdf
>
> Somewhat related, there used to be the start-delay meta parameter which
> could be set for example for sbd stonith resources, to make a
> double-fencing scenario less likely. This has now been replaced by the
> pcmk_delay_max parameter. For an example of how to use this, see this
> pull request for sbd:
>
> https://github.com/ClusterLabs/sbd/pull/15/commits/
> ca2fba836eab169f0c8cacf7f3757c0485bcfef8
>
> Cheers,
> Kristoffer
>
> --
> // Kristoffer Grönlund
> // kgronl...@suse.com
>
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Announcing crmsh release 2.1.7

2016-09-01 Thread Darren Thompson
Team

good work on the new version, appreciated.

Just a quick question:

If "scripts: no-quorum-policy=ignore" is becoming depreciated, how are we
to manage two node (e.g. test) clusters that require this work around since
quorum state on a single node is an odd state.

Regards





Darren Thompson

Professional Services Engineer / Consultant



Level 3, 60 City Road

Southgate, VIC 3006

Mb: 0400 640 414

Mail: darr...@akurit.com.au <st...@akurit.com.au>
Web: www.akurit.com.au

On 1 September 2016 at 17:01, Kristoffer Grönlund <kgronl...@suse.com>
wrote:

> Hello everyone!
>
> Today I are proud to announce the release of `crmsh` version 2.1.7!
> The major new thing in this release is a backports of the events-based
> alerts support from the 2.3 branch.
>
> Big thanks to Hideo Yamauchi for his patience and testing of the
> alerts backport.
>
> This time, the list of changes is small enough that I can add it right
> here:
>
> - high: parse: Backport of event-driven alerts parser (#150)
> - high: hb_report: Don't collect logs from journalctl if -M is set
> (bsc#990025)
> - high: hb_report: Skip lines without timestamps in log correctly
> (bsc#989810)
> - high: constants: Add maintenance to set of known attributes (bsc#981659)
> - high: utils: Avoid deadlock if DC changes during idle wait (bsc#978480)
> - medium: scripts: no-quorum-policy=ignore is deprecated (bsc#981056)
> - low: cibconfig: Don't mix up CLI name with XML tag
>
> You can also get the list of changes from the changelog:
>
> * https://github.com/ClusterLabs/crmsh/blob/2.1.7/ChangeLog
>
> Right now, I don't have a set of pre-built rpm packages for Linux
> distributions ready, but I am going to make this available soon. This
> is in particular for centOS 6.x which still relies on Python 2.6
> support which makes running the later releases there more
> difficult. These packages will most likely appear as a subrepository
> here (more details coming soon):
>
> * http://download.opensuse.org/repositories/network:/ha-
> clustering:/Stable/
>
> Archives of the tagged release:
>
> * https://github.com/ClusterLabs/crmsh/archive/2.1.7.tar.gz
> * https://github.com/ClusterLabs/crmsh/archive/2.1.7.zip
>
>
> Thank you,
>
> Kristoffer
>
> --
> // Kristoffer Grönlund
> // kgronl...@suse.com
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] R: R: [PATCH] Filesystem RA:

2013-04-10 Thread Darren Thompson (AkurIT)
Hi G.

I personally recommend as a minimum that you setup a SBD partition and use SBD 
STONITH. It protects against file/ database corruption in the event of an issue 
on the underlying storage.

Hardware (power) STONITH is considered the best protection, but I have had 
clusters running for years using just SBD STONITH and I would not deploy a 
cluster managed file system without it,

You should also strongly consider setting the fence on stop failure for the 
same reason. The worst possible corruption can be caused by the cluster having 
a  split brain due to a partially dismounted file system and another node 
mounting and writing to it at the same time.

Regards
D.


On 10/04/2013, at 5:30 PM, Guglielmo Abbruzzese g.abbruzz...@resi.it wrote:

 Hi Darren,
 I am aware STONITH could help, but unfortunately I cannot add such device to 
 the architecture at the moment. 
 Furthermore, sybase seems to be stopped  (the start/stop order should be 
 already granted by the Resource Group structure)
 
 Resource Group: grp-sdg
 resource_vrt_ip(ocf::heartbeat:IPaddr2):   Started NODE_A
 resource_lvm   (ocf::heartbeat:LVM):   Started NODE_A
 resource_lvmdir(ocf::heartbeat:Filesystem):failed (and so 
 unmanaged)
 resource_sybase(lsb:sybase):   stopped
 resource_httpd (lsb:httpd):stopped
 resource_tomcatd   (lsb:tomcatd):  stopped
 resource_sdgd  (lsb:sdgd): stopped
 resource_statd (lsb:statistiched): stopped
 
 I'm just guessing, why the same configuration swapped fine with the previous 
 storage? The only difference could be the changed multipath configuration 
 
 Thanks a lot
 G.
 
 
 -Messaggio originale-
 Da: linux-ha-dev-boun...@lists.linux-ha.org 
 [mailto:linux-ha-dev-boun...@lists.linux-ha.org] Per conto di Darren Thompson 
 (AkurIT)
 Inviato: martedì 9 aprile 2013 23:35
 A: High-Availability Linux Development List
 Oggetto: Re: [Linux-ha-dev] R: [PATCH] Filesystem RA:
 
 Hi
 
 The correct way for that to have been handled, given you additional detail 
 would have been for the node to have received a STONITH.
 
 Things that you should check:
 1 STONITH device configured correctly and operational.
 2 the  on fail for any file system cluster resource stop should be  fence.
 3 you need to review your constraints to ensure that the order and 
 relationship between SYBASE and file system resource needs to be corrected so 
 that SYBASE is stopped first.
 
 Hope this helps
 
 Darren 
 
 
 Sent from my iPhone
 
 On 09/04/2013, at 11:57 PM, Guglielmo Abbruzzese g.abbruzz...@resi.it 
 wrote:
 
 Hi everybody,
 In my case (very similar to Junko's) when I disconnect the Fibre 
 Channels the try_umount procedure in RA Filesystem script doesn't work.
 
 After the programmed attempts the active/passive cluster doesn't swap, 
 and the lvmdir resource is flagged as failed rather than stopped.
 
 I must say, even if I try to umount the /storage resource manually it 
 doesn't work because of sybase is using some files stored on it 
 (busy); this is why the RA cannot complete the operation in a clean 
 mode. Is there a way to force the swap anyway?
 
 Some issues. I already tried:
 1) This very test with a different optical SAN/storage in the past, 
 and the RA could always umount correctly the storage;
 2) I modified the RA forcing the option umount -l even in case I've 
 got a
 ext4 FR rather than NFS;
 3) I killed the hanged processes with the command fuser -km /storage  
 but the umount option always failed, and after a while I obtained a 
 kernel panic
 
 Is there a way to force the swap anyway, even if the umount is not clean?
 Any suggestion?
 
 Thanks for your time,
 Regards
 Guglielmo
 
 P.S. lvmdir resource configuration
 
 primitive class=ocf id=resource_lvmdir provider=heartbeat
 type=Filesystem
 instance_attributes id=resource_lvmdir-instance_attributes
   nvpair id=resource_lvmdir-instance_attributes-device
 name=device value=/dev/VG_SDG_Cluster_RM/LV_SDG_Cluster_RM/
   nvpair id=resource_lvmdir-instance_attributes-directory
 name=directory value=/storage/
   nvpair id=resource_lvmdir-instance_attributes-fstype
 name=fstype value=ext4/
 /instance_attributes
 meta_attributes id=resource_lvmdir-meta_attributes
   nvpair id=resource_lvmdir-meta_attributes-multiple-active
 name=multiple-active value=stop_start/
   nvpair id=resource_lvmdir-meta_attributes-migration-threshold
 name=migration-threshold value=1/
   nvpair id=resource_lvmdir-meta_attributes-failure-timeout
 name=failure-timeout value=0/
 /meta_attributes
 operations
   op enabled=true id=resource_lvmdir-startup interval=60s
 name=monitor on-fail=restart requires=nothing timeout=40s/
   op id=resource_lvmdir-start-0 interval=0 name=start
 on-fail=restart requires=nothing timeout=180s/
   op id=resource_lvmdir-stop-0 interval=0 name=stop
 on-fail=restart requires

Re: [Linux-ha-dev] R: [PATCH] Filesystem RA:

2013-04-09 Thread Darren Thompson (AkurIT)
Hi

The correct way for that to have been handled, given you additional detail 
would have been for the node to have received a STONITH.

Things that you should check:
1 STONITH device configured correctly and operational.
2 the  on fail for any file system cluster resource stop should be  fence.
3 you need to review your constraints to ensure that the order and relationship 
between SYBASE and file system resource needs to be corrected so that SYBASE is 
stopped first.

Hope this helps

Darren 


Sent from my iPhone

On 09/04/2013, at 11:57 PM, Guglielmo Abbruzzese g.abbruzz...@resi.it wrote:

 Hi everybody,
 In my case (very similar to Junko's) when I disconnect the Fibre Channels
 the try_umount procedure in RA Filesystem script doesn't work. 
 
 After the programmed attempts the active/passive cluster doesn't swap, and
 the lvmdir resource is flagged as failed rather than stopped. 
 
 I must say, even if I try to umount the /storage resource manually it
 doesn't work because of sybase is using some files stored on it (busy); this
 is why the RA cannot complete the operation in a clean mode. Is there a way
 to force the swap anyway?
 
 Some issues. I already tried:
 1) This very test with a different optical SAN/storage in the past, and the
 RA could always umount correctly the storage;
 2) I modified the RA forcing the option umount -l even in case I've got a
 ext4 FR rather than NFS;
 3) I killed the hanged processes with the command fuser -km /storage  but
 the umount option always failed, and after a while I obtained a kernel panic
 
 Is there a way to force the swap anyway, even if the umount is not clean?
 Any suggestion?
 
 Thanks for your time,
 Regards
 Guglielmo
 
 P.S. lvmdir resource configuration
 
 primitive class=ocf id=resource_lvmdir provider=heartbeat
 type=Filesystem
  instance_attributes id=resource_lvmdir-instance_attributes
nvpair id=resource_lvmdir-instance_attributes-device
 name=device value=/dev/VG_SDG_Cluster_RM/LV_SDG_Cluster_RM/
nvpair id=resource_lvmdir-instance_attributes-directory
 name=directory value=/storage/
nvpair id=resource_lvmdir-instance_attributes-fstype
 name=fstype value=ext4/
  /instance_attributes
  meta_attributes id=resource_lvmdir-meta_attributes
nvpair id=resource_lvmdir-meta_attributes-multiple-active
 name=multiple-active value=stop_start/
nvpair id=resource_lvmdir-meta_attributes-migration-threshold
 name=migration-threshold value=1/
nvpair id=resource_lvmdir-meta_attributes-failure-timeout
 name=failure-timeout value=0/
  /meta_attributes
  operations
op enabled=true id=resource_lvmdir-startup interval=60s
 name=monitor on-fail=restart requires=nothing timeout=40s/
op id=resource_lvmdir-start-0 interval=0 name=start
 on-fail=restart requires=nothing timeout=180s/
op id=resource_lvmdir-stop-0 interval=0 name=stop
 on-fail=restart requires=nothing timeout=180s/
  /operations
 /primitive
 
 2012/5/9 Junko IKEDA tsukishima...@gmail.com:
 Hi,
 
 In my case, the umount succeed when the Fibre Channels is 
 disconnected, so it seemed that the handling status file caused a 
 longer failover, as Dejan said.
 If the umount fails, it will go into a timeout, might call stonith 
 action, and this case also makes sense (though I couldn't see this).
 
 I tried the following setup;
 
 (1) timeout : multipath  RA
 multipath timeout = 120s
 Filesystem RA stop timeout = 60s
 
 (2) timeout : multipath  RA
 multipath timeout = 60s
 Filesystem RA stop timeout = 120s
 
 case (1), Filesystem_stop() fails. The hanging FC causes the stop timeout.
 
 case (2), Filesystem_stop() succeeds.
 Filesystem is hanging out, but line 758 and 759 succeed(rc=0).
 The status file is no more inaccessible, so it remains on the 
 filesystem, in fact.
 
 758 if [ -f $STATUSFILE ]; then
 759 rm -f ${STATUSFILE}
 760 if [ $? -ne 0 ]; then
 
 so, the line 761 might not be called as expected.
 
 761 ocf_log warn Failed to remove status file ${STATUSFILE}.
 
 
 By the way, my concern is the unexpected stop timeout and the longer 
 fail over time, if OCF_CHECK_LEVEL is set as 20, it would be better to 
 try remove its status file just in case.
 It can handle the case (2) if the user wants to recover this case with
 STONITH.
 
 
 Thanks,
 Junko
 
 2012/5/8 Dejan Muhamedagic de...@suse.de:
 Hi Lars,
 
 On Tue, May 08, 2012 at 01:35:16PM +0200, Lars Marowsky-Bree wrote:
 On 2012-05-08T12:08:27, Dejan Muhamedagic de...@suse.de wrote:
 
 In the default (without OCF_CHECK_LEVE), it's enough to try 
 unmount the file system, isn't it?
 https://github.com/ClusterLabs/resource-agents/blob/master/heart
 beat/Filesystem#L774
 
 I don't see a need to remove the STATUSFILE at all, as that may 
 (and as you observed it) prevent the filesystem from stopping.
 Perhaps to skip it altogether? If nobody objects let's just remove 
 this code:
 
  758 if [ -f 

Re: [Linux-ha-dev] lxc RA merged

2011-06-06 Thread Darren Thompson
Florian/Team

Please find an updated version for the 'lxc resource agent'

As Florian pointed out, I had not properly initialised/set the new
use_screen parameter.

The attached file includes that correction.

PS (OK I'm a complete newbie and I know this should be obvious but) I
had attempted to update the original in GITHUB (by forking) but am now
unable to edit it to add this missing attribute,  How do I re-edit my
change to include a second change

Darren

On Mon, 2011-06-06 at 12:36 +0930, Darren Thompson wrote:

 Florian
 
 I have done some live fire testing of the updated lxc resource.
 
 I noted that screens has been depreciated in favour of running the
 lxc as a daemon, with output going to a new log file
 
 Unfortunately when i used it in this configuration I cannot connect to
 the running container and all the log output shows is
 more processes left in this runlevel for 5 minutesstty: standard
 input: Inappropriate ioctl for device
 Master Resource Control: previous runlevel: N, switching to runlevel:
 3
 tcgetattr: Inappropriate ioctl for device
 Master Resource Control: runlevel 3 has been reached
 stty: standard input: Inappropriate ioctl for device.
 
 The cluster show the container as running, but I cannot ping the IP
 address that the container should be using so cannot confirm that it
 is running correctly.
 
 I suspect that the container is having trouble running as there is not
 a root console device when run as a daemon.
 
 Without the root console available via screens it's very very
 difficult to diagnose the issue with the container to be certain as to
 what is casing the problem.
 
 I may create a modified version with the screens re-added as an
 option, as that is my personal preference and will also help
 diagnose the error I'm currently getting with the lxc resource.
 
 I'll sen out the updated version as an attachment to this list (I
 still have no idea how to create patches/submissions on GIT hub).
 
 I'm also now getting errors on the original links to your fork on
 GitHub, I'm assuming it's because the driver it's now been pulled into
 the core (or something) making your fork redundant.
 
 Darren
 
 
 On Mon, 2011-05-30 at 15:45 +0200, Florian Haas wrote: 
 
  Hello,
  
  after much useful testing from Christoph Mitasch and a number of
  necessary changes highlighted by ocf-tester, I've now merged and pushed
  the lxc resource agent that was originally contributed by Darren Thompson.
  
  The resource agent is here:
  
  https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/lxc
  
  Its commit history up to this point can be reviewed here:
  
  https://github.com/ClusterLabs/resource-agents/commits/master/heartbeat/lxc
  
  Hope this is useful.
  
  Cheers,
  Florian
  
  
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/


lxc
Description: application/shellscript
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] lxc RA merged

2011-06-05 Thread Darren Thompson
Florian

I have done some live fire testing of the updated lxc resource.

I noted that screens has been depreciated in favour of running the lxc
as a daemon, with output going to a new log file

Unfortunately when i used it in this configuration I cannot connect to
the running container and all the log output shows is
more processes left in this runlevel for 5 minutesstty: standard input:
Inappropriate ioctl for device
Master Resource Control: previous runlevel: N, switching to runlevel: 3
tcgetattr: Inappropriate ioctl for device
Master Resource Control: runlevel 3 has been reached
stty: standard input: Inappropriate ioctl for device.

The cluster show the container as running, but I cannot ping the IP
address that the container should be using so cannot confirm that it is
running correctly.

I suspect that the container is having trouble running as there is not a
root console device when run as a daemon.

Without the root console available via screens it's very very
difficult to diagnose the issue with the container to be certain as to
what is casing the problem.

I may create a modified version with the screens re-added as an
option, as that is my personal preference and will also help diagnose
the error I'm currently getting with the lxc resource.

I'll sen out the updated version as an attachment to this list (I still
have no idea how to create patches/submissions on GIT hub).

I'm also now getting errors on the original links to your fork on
GitHub, I'm assuming it's because the driver it's now been pulled into
the core (or something) making your fork redundant.

Darren


On Mon, 2011-05-30 at 15:45 +0200, Florian Haas wrote:

 Hello,
 
 after much useful testing from Christoph Mitasch and a number of
 necessary changes highlighted by ocf-tester, I've now merged and pushed
 the lxc resource agent that was originally contributed by Darren Thompson.
 
 The resource agent is here:
 
 https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/lxc
 
 Its commit history up to this point can be reviewed here:
 
 https://github.com/ClusterLabs/resource-agents/commits/master/heartbeat/lxc
 
 Hope this is useful.
 
 Cheers,
 Florian
 
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] lxc RA merged

2011-06-05 Thread Darren Thompson
Florian

Please find attached an updated version of the 'lxc resource agent'.

It is based on the up-line lxc resource agent so these changes should be
relatively easily merged (one day I work out how to create patches).

I have re-added screen support as an option (off by default).

My reasoning for doing that is:

1. Quite frankly, I cannot get the containers to run correctly in my
test environment without the 'root console' being redirected to a
screen.
2. I personally like to see what the containers 'root console' is
doing.
3. the option is off by default so should not upset any-ones
sensibilities (I really don't understand what people have against using
screen in this case)

Darren

On Mon, 2011-05-30 at 15:45 +0200, Florian Haas wrote:

 Hello,
 
 after much useful testing from Christoph Mitasch and a number of
 necessary changes highlighted by ocf-tester, I've now merged and pushed
 the lxc resource agent that was originally contributed by Darren Thompson.
 
 The resource agent is here:
 
 https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/lxc
 
 Its commit history up to this point can be reviewed here:
 
 https://github.com/ClusterLabs/resource-agents/commits/master/heartbeat/lxc
 
 Hope this is useful.
 
 Cheers,
 Florian
 
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/


lxc
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] lxc RA merged

2011-05-30 Thread Darren Thompson
Florian

I'll fire this up in my test lab and see if these changes break
anything under Live fire.

Definitely tidier code.

I'll update this list and post you anything that I feel needs changing.

Darren

On Mon, 2011-05-30 at 15:45 +0200, Florian Haas wrote:

 Hello,
 
 after much useful testing from Christoph Mitasch and a number of
 necessary changes highlighted by ocf-tester, I've now merged and pushed
 the lxc resource agent that was originally contributed by Darren Thompson.
 
 The resource agent is here:
 
 https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/lxc
 
 Its commit history up to this point can be reviewed here:
 
 https://github.com/ClusterLabs/resource-agents/commits/master/heartbeat/lxc
 
 Hope this is useful.
 
 Cheers,
 Florian
 
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 90, Issue 8

2011-05-11 Thread Darren Thompson
Florian

Could you send me an actual file so I can use that as a template.

I still do not have my head around what the actual requirements are.

Darren

On Wed, 2011-05-11 at 16:49 +0200, Florian Haas wrote:

 Darren,
 
 On 2011-05-05 15:07, Florian Haas wrote:
  On 2011-05-05 14:26, Darren Thompson wrote:
  Can you confirm that the current version is working for you and passes
  ocf-tester on your system?
 
  What is an ocf-tester???
  
  http://www.linux-ha.org/doc/dev-guides/_testing_installing_and_packaging_resource_agents.html
  
  I have been testing this the hard way by actually creating and running
  the agents against actual LXC containers in a running cluster... If
  there is a simple way of streamlining this testing I'd love to hear more
  about it. (Did I mention that I'm not normally a coder/developer? -
  Yes I know that's getting repetitive ;-) )
 
  But, back on topic... I can confirm that this agent is working correctly
  in a live fire environment.
  
  That's good to know. ocf-tester doesn't shoot blanks either (it operates
  on an actual incarnation of the resource), but it might run some tests
  that you manually do not, so it's always a wise idea to use it.
 
 Any news regarding running ocf-tester on your lxc agent?
 
 Cheers,
 Florian
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Filesystem ocf file

2011-05-06 Thread Darren Thompson
Florian

Ok then... I agree it does seem to be poorly designed and It's far from
intuitive...

But If it's actually correct who am I to argue...

Darren

On Fri, 2011-05-06 at 09:37 +0200, Florian Haas wrote:

 On 2011-05-06 09:26, Darren Thompson wrote:
  Team
  
  I was reviewing some errors on a cluster mounted file-system that caused
  me to review the Filesystem ocf file.
  
  I notice that it uses an undeclared parameter of OCF_CHECK_LEVEL to
  determine what degree of testing of the filesystem is required in monitor
  
  I have now updated it to more formally work with a check_level value
  with the more obvious values of mounted, read  write ( my updated
  version attached )
  
  Could someone (Florian is this something you can do?) please review this
  with a view to patching the upstream Filesystem ocf file.
 
 NACK, sorry. The OCF_CHECK_LEVEL is specific to the monitor action and
 described as such in the OCF spec; this will not be changed without a
 change to the spec.
 
 To use it, set op monitor interval=X OCF_CHECK_LEVEL=Y
 
 Yes, it's poorly designed, it makes no sense why this is pretty much the
 only sensible time to set a parameter specifically for an operation (as
 opposed to on a resource), it's inexplicable why it's all caps, etc.,
 but that's the way it is.
 
 Cheers,
 Florian
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 90, Issue 8

2011-05-05 Thread Darren Thompson
Florian/Team

Comments in-line...



On Thu, 2011-05-05 at 05:47 -0600,
linux-ha-dev-requ...@lists.linux-ha.org wrote:

 Darren,
 
 can you please subscribe to the list as a normal subscriber rather
 than
 to just the digest, so we can keep this discussion in one thread?


Ok, done... The digest mode was a good idea at the time... 


 
 On 2011-05-05 04:47, Darren Thompson wrote:
  Florian/Team
  
  There was an error in the GIT-Hub version that was causing my
 re-base
  attempts to fail, so I was forced to try to bring my last known
 good
  version to the same configuration (mostly successful).
  
  I have since found the error in the GIT-Hub version (the
 initialisation
  section was wrong, the meta-data error was a 'red herring') so have
 been
  found and resolved so I have done an actual re-base now based on the
  GIT-Hub version.
  
  Changes:
  
  1. Corrected error in utilisation causing ocf to fail in HB_GUI.
 
 That is not an error; the Github version is correct. The path to the
 ocf-shellfuncs library was recently changed upstream; your installed
 version is apparently still using the old path. For the Github version
 to work on your system, you will have to apply the attached patch
 after
 you check out.


If I had any idea how to use GIT and apply patches, this whole
conversation would never be happening ;-)

Did I mention that I'm not normally a coder/developer?


 Note that normally people would be building the whole resource-agents
 package from a git checkout and use _that_ on their test system, but
 you're not using git, so that option is out for you. Have I mentioned
 that starting to use git would be a good option?

Did I ever claim to be normal...

Mind you, if I told my partner... normally people would be building the
whole resource-agents package from a git checkout and use _that_ on
their test system... she would laugh her head off... I suppose it
depends on your definition of normality... ;-)

Using git would probably be a good option, but my requirement is for
this to work under SLES11SP1 with the HA option pack, none of which is
consistent with building the whole resource-agent package, although I
do intend to raise a SR with Attachmate/Novell/SuSE for them to come
in-line with these standards; as to support their environment on clients
sites would be a PITA if they vary too much from the agreed standards.


 
  2. Added information  to stop  section, to provide more feedback
 on
  container shutdown/stop (and to assist with future development of
  containers using alternate 'init' systems).
 
 Applied and pushed to my lxc branch.


Thank you. Yes I can see that in the on-line version.


 
 Can you confirm that the current version is working for you and passes
 ocf-tester on your system?


What is an ocf-tester??? 

I have been testing this the hard way by actually creating and running
the agents against actual LXC containers in a running cluster... If
there is a simple way of streamlining this testing I'd love to hear more
about it. (Did I mention that I'm not normally a coder/developer? -
Yes I know that's getting repetitive ;-) )

But, back on topic... I can confirm that this agent is working correctly
in a live fire environment.

Darren
attachment: face-wink.png___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 90, Issue 2

2011-05-04 Thread Darren Thompson
Florian

I have tried to re-base on your version but it just will not run for me.

I keep getting Failed to parse the metadata of LXC syntax error line
1, column 1

I've no idea where this error is as it all looks fine...

I'll attach my copy and a screen-shot of the error, HELP!!!

Darren


On Tue, 2011-05-03 at 07:59 -0600,
linux-ha-dev-requ...@lists.linux-ha.org wrote:

 Date: Tue, 03 May 2011 08:20:56 +0200
 From: Florian Haas florian.h...@linbit.com
 Subject: Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux
 Containers) - Linux-HA-Dev Digest, Vol 89, Issue 32
 To: High-Availability Linux Development List
 linux-ha-dev@lists.linux-ha.org
 Message-ID: 4dbf9ec8.4060...@linbit.com
 Content-Type: text/plain; charset=utf-8
 
 Hello Darren,
 
 Please get the current version from
 https://github.com/fghaas/resource-agents/blob/lxc/heartbeat/lxc, and
 also review the commit history at
 https://github.com/fghaas/resource-agents/commits/lxc/heartbeat/lxc.
 
 When you send more updates, please do make sure they track the latest
 version in my repo. I am doing my best splitting this up into patches
 as
 I can and check them in individually, but the re-introduction of
 errors
 that have already been fixed is not something that gives me thrills.
 Thanks.
 
 Cheers,
 Florian
attachment: Screenshot-Message.png

lxc
Description: application/shellscript
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Linux-HA-Dev Digest, Vol 90, Issue 4

2011-05-04 Thread Darren Thompson
Florian/Team

I have succeeded in re-basing my work on the version in the repository.
That should make re-integrating my changes much more straight forward.. 

It is much more succinct now.

Changes in this version:

1. Re-based on Florian's version in
https://github.com/fghaas/resource-agents/blob/lxc/heartbeat/lxc;
2. Removed root variable requirements (it was only used once and did
not add significant value to configuration)
3. Removed mention of root from verify and stop sections
3. Cleaned up and expanded meta-data section, added to descriptions etc
4. remove superfluous cd in start section as no longer requires with
full config path specified.

I really do not know what happend to my first few attempts at re-basing,
I'm assuming an invalid character got into the file somewhere... 

Darren

On Wed, 2011-05-04 at 00:44 -0600,
linux-ha-dev-requ...@lists.linux-ha.org wrote:

 Florian
 
 I have tried to re-base on your version but it just will not run for
 me.
 
 I keep getting Failed to parse the metadata of LXC syntax error line
 1, column 1
 
 I've no idea where this error is as it all looks fine...
 
 I'll attach my copy and a screen-shot of the error, HELP!!!
 
 Darren


lxc
Description: application/shellscript
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 90, Issue 6

2011-05-04 Thread Darren Thompson
Florian/Team

There was an error in the GIT-Hub version that was causing my re-base
attempts to fail, so I was forced to try to bring my last known good
version to the same configuration (mostly successful).

I have since found the error in the GIT-Hub version (the initialisation
section was wrong, the meta-data error was a 'red herring') so have been
found and resolved so I have done an actual re-base now based on the
GIT-Hub version. 

Changes:

1. Corrected error in utilisation causing ocf to fail in HB_GUI.
2. Added information  to stop  section, to provide more feedback on
container shutdown/stop (and to assist with future development of
containers using alternate 'init' systems).

Regards
Darren


On Wed, 2011-05-04 at 12:00 -0600,
linux-ha-dev-requ...@lists.linux-ha.org wrote:

  Florian/Team
  
  I have now updated my re-based ocf file to include the
 experimental
  support for upstart and systemd using containers.
  
  I can confirm that this is still working correctly for containers
  running 'sysv init' and in theory should now also work for
 containers
  using 'upstart' and 'systemd'.
  
  I'm currently doing a crash course' in installing containers to use
  these 'init replacments' but have not yet succedded in testing
 either
  'upstart' or 'systemd' containers yet.
  
  If there is anyone with a better understanding of LXC containers and
  one/both of these other 'init systems', please contact me as your
  information/assistance would be invaluable.
 
 OK, updated my git branch. You really want to double check your
 rebasing method; you're constantly re-introducing things that I've
 removed or fixed in earlier commits.
 
 Florian
 


lxc
Description: application/shellscript
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 89, Issue 33

2011-05-03 Thread Darren Thompson
Florian/Team

Another update to the lxc (Linux container) ocf file. (attached)

Changes (summary):

Added very very very experimental support for alternate init systems
inside containers (it should now support sysvinit, upstart and systemd).

Adding this support did not break the default sysvinit, but since I do
not know how to create a LXC container that uses 'upstart' or 'systemd'
my testing is very rudimentary for those two systems.

I have made no progress whatsoever with removing the requirement for
screen as I still have not found a working alternative to provide the
root console created by lxc-start (it takes over the default console
it's run on and that is lost when run as a cluster service, if screen is
not used.)  

At this point I may have to confess that getting this working without
using screen may be beyond my abilities (for now, I'm stubborn so will
keep plugging away at this, but don't hold your breath).

I'm still not sure why the use of screen is so repellent to some, as it
works well and is quite innocuous generally.

Regards
Darren


On Sat, 2011-04-30 at 12:00 -0600,
linux-ha-dev-requ...@lists.linux-ha.org wrote:

 Date: Sat, 30 Apr 2011 16:10:52 +0930
 From: Darren Thompson darr...@akurit.com.au
 Subject: Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux
 Containers) - Linux-HA-Dev Digest, Vol 89, Issue 32
 To: linux-ha-dev@lists.linux-ha.org
 Message-ID: 1304145652.5625.50.ca...@darrenspc.akurit.com.au
 Content-Type: text/plain; charset=utf-8
 
 Florin/TEAM
 
 Please find the latest instalment of the LXC containers ocf.
 
 Changes (summary):
 Moved cgroup_mounted out of default initialisation and made it a
 function (used by start/stop).
 Also cleaned up some other code sections, including expanding on
 verify_all section to more fully test configuration. Also merged
 validate and status sections.
 My next work will be determining the best way to make the containers
 init type independent (due to the rise of init replacements like
 systemd and upstart)  and  also investigating the removal of the
 screen tool from the startup as it's received negative feed back
 from
 a few sources
 
 Darren


lxc
Description: application/shellscript
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 90, Issue 2

2011-05-03 Thread Darren Thompson
Florian/Team

Sorry I did not read this sooner, my last update will still have be
messy for you (sorry).

I'll grab a copy of the current version and re-base my work on that.

I see that you have streamlined it quite a bit, I'll test it in my
environment to ensure it's working as expected (I note that the
parameters have changed names and some functionality so will re-create
my cluster/lxc/containers using this and re-test).

Darren


On Tue, 2011-05-03 at 07:59 -0600,
linux-ha-dev-requ...@lists.linux-ha.org wrote:

 Hello Darren,
 
 Please get the current version from
 https://github.com/fghaas/resource-agents/blob/lxc/heartbeat/lxc, and
 also review the commit history at
 https://github.com/fghaas/resource-agents/commits/lxc/heartbeat/lxc.
 
 When you send more updates, please do make sure they track the latest
 version in my repo. I am doing my best splitting this up into patches
 as
 I can and check them in individually, but the re-introduction of
 errors
 that have already been fixed is not something that gives me thrills.
 Thanks.
 
 Cheers,
 Florian
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 89, Issue 32

2011-04-30 Thread Darren Thompson
Florin/TEAM

Please find the latest instalment of the LXC containers ocf.

Changes (summary):
Moved cgroup_mounted out of default initialisation and made it a
function (used by start/stop).
Also cleaned up some other code sections, including expanding on
verify_all section to more fully test configuration. Also merged
validate and status sections.
My next work will be determining the best way to make the containers
init type independent (due to the rise of init replacements like
systemd and upstart)  and  also investigating the removal of the
screen tool from the startup as it's received negative feed back from
a few sources

Darren




On Fri, 2011-04-29 at 12:00 -0600,
linux-ha-dev-requ...@lists.linux-ha.org wrote:

 Date: Fri, 29 Apr 2011 09:57:04 +0200
 From: Florian Haas florian.h...@linbit.com
 Subject: Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux
 Containers) - Linux-HA-Dev Digest, Vol 89, Issue 30
 To: High-Availability Linux Development List
 linux-ha-dev@lists.linux-ha.org
 Message-ID: 4dba6f50.7090...@linbit.com
 Content-Type: text/plain; charset=utf-8
 
 On 2011-04-29 08:04, Darren Thompson wrote:
  You posted my first attempt and not the latest version, is it
 possible
  to add that one as it addresses some( most hopefully) of the issues
 you
  identified.
 
 Already there. Been there since yesterday.
 
 https://github.com/fghaas/resource-agents/commit/07827c42494dbec2c011133d9f82e831bc8b2eb6
 
  There are still some valid points you have raised however, So I'm
 going
  to try to incorporate them into a third version.
 
 See how much easier this would be if you actually did this in your own
 github repo that we could just pull from?
 
 Cheers,
 Florian


lxc
Description: application/shellscript
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers) - Linux-HA-Dev Digest, Vol 89, Issue 30

2011-04-29 Thread Darren Thompson
Florian/TEAM

Thank you for the update.

I'll thread my remaining replies into the message :-)

On Thu, 2011-04-28 at 07:46 -0600,
linux-ha-dev-requ...@lists.linux-ha.org wrote:

 Date: Thu, 28 Apr 2011 11:56:30 +0200
 From: Florian Haas florian.h...@linbit.com
 Subject: Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux
 Containers)
 To: High-Availability Linux Development List
 linux-ha-dev@lists.linux-ha.org
 Message-ID: 4db939ce.5000...@linbit.com
 Content-Type: text/plain; charset=utf-8
 
 On 2011-04-28 10:21, Darren Thompson wrote:
  Florina/TEAM
  
  Thanks for your input and the link to the guidelines
  
  I have updated my original ocf file in line with the guidlines, it
 even
  gave me a few tips on how to do things better so was well worth
 the
  time spent.
  
  Please find the updated ocf file for LXC contianers as a cluster
  resource attached.
  
  Since I'm not an actual developer (or even a career coder)
 
 Do you think I am?

Until Today, i have had no experience whatsoever with github, so
compared to me... yes...

 
  I do not have
  the facility to host my own github fork so would appreciate
 someone
  adopting this and integrating it into their git repository.
 
 OK, I have added this to a separate lxc branch in my own github
 fork.
 I'd appreciate if you could at least get yourself an account on github
 so you can comment on commit line notes.
 
 I have added my comments to this page:
 
 https://github.com/fghaas/resource-agents/commit/73f80b31f1cee5eff1c2fe2b968f4ea593e8f405

Yep, done.. I responded to nearly all of the you points (most of the
time to say, yep... agree).

You posted my first attempt and not the latest version, is it possible
to add that one as it addresses some( most hopefully) of the issues you
identified.

There are still some valid points you have raised however, So I'm going
to try to incorporate them into a third version.

Is there some clever way of re-integrating all of this? (did I mention
that I'm not normally a coder).

 
 
 Some of those may have already been addresses in your updated version,
 but to keep things simple I've kept my comments to one commit for the
 time being.
 
 Florian
 
 PS: We can stop CC'ing the openais list, this is in no way
 Corosync/OpenAIS related.

Agreed, I will stop pestering that list now :-)

Darren
attachment: face-smile.png___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers)

2011-04-28 Thread Darren Thompson
Florina/TEAM

Thanks for your input and the link to the guidelines

I have updated my original ocf file in line with the guidlines, it even
gave me a few tips on how to do things better so was well worth the
time spent.

Please find the updated ocf file for LXC contianers as a cluster
resource attached.

Since I'm not an actual developer (or even a career coder) I do not have
the facility to host my own github fork so would appreciate someone
adopting this and integrating it into their git repository.

I have since added myself the the developer mailing list so I should be
able to contribute to the refining of this.

Regards
Darren


On Tue, 2011-04-26 at 15:36 +0200, Florian Haas wrote:

 Thanks Darren!
 
 Thanks for the contribution! Can I suggest
 
 - we move this discussion to the linux-ha-dev list (where most OCF RA
 related discussions and reviews take place);
 
 - you give the RA a makeover following the OCF RA developer's guide
 (http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html);
 
 - you set up your own github fork off of
 https://github.com/ClusterLabs/resource-agents, and push your RA to that
 so we can eventually pull it into the mainline repo?
 
 Also, can you explain what the advantages of your approach are, versus
 using libvirt-managed lxc containers which Pacemaker can tie into via
 the existing VirtualDomain agent?
 
 Thanks!
 Cheers,
 Florian
 


lxc
Description: application/shellscript
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/