Re: [Linux-HA] Hertbeat fail-over Email Alert

2014-09-24 Thread Tom Parker
Hi Lars Can you provide more details about this resource agent. The documentation is a little sparse. What events will cause an e-mail to be sent? Thanks! Tom On 24/09/14 06:53 PM, Lars Ellenberg wrote: On Tue, Sep 23, 2014 at 04:55:20PM +0530, Atul Yadav wrote: Dear Team , In our

Re: [Linux-HA] SBD flipping between Pacemaker: UNHEALTHY and OK

2014-04-23 Thread Tom Parker
in wrong way in your multipath, try to read this http://www.novell.com/documentation/oes2/clus_admin_lx/data/bl9ykz6.html 2014-04-22 20:41 GMT+02:00 Tom Parker tpar...@cbnco.com: I have attached the config files to this e-mail. The sbd dump is below [LIVE] qaxen1:~ # sbd -d /dev/mapper/qa-xen-sbd

Re: [Linux-HA] SBD flipping between Pacemaker: UNHEALTHY and OK

2014-04-23 Thread Tom Parker
SDB has a connection to pacemaker to establish overall cluster health (the -P flag). This seems to be where the problem is. I just don't know what the problem might be. On 23/04/14 11:32 AM, emmanuel segura wrote: what do you mean with link? 2014-04-23 15:23 GMT+02:00 Tom Parker tpar

[Linux-HA] Resource blocked

2014-04-22 Thread Tom Parker
Good morning I am trying to restart resources on one of my clusters and I am getting the message pengine[13397]: notice: LogActions: Start domtcot1-qa(qaxen1 - blocked) How can I find out why this resource is blocked. Thanks ___ Linux-HA

Re: [Linux-HA] SBD flipping between Pacemaker: UNHEALTHY and OK

2014-04-22 Thread Tom Parker
:21 GMT+02:00 Tom Parker tpar...@cbnco.com: Has anyone seen this? Do you know what might be causing the flapping? Apr 21 22:03:03 qaxen6 sbd: [12962]: info: Watchdog enabled. Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Servant starting for device /dev/mapper/qa-xen-sbd Apr 21 22:03:03 qaxen6

Re: [Linux-HA] /usr/sbin/lrmadmin missing from cluster-glue

2014-01-24 Thread Tom Parker
Thanks Kristoffer. How is tuning done for lrm now? Tom On 01/24/2014 01:41 AM, Kristoffer Grönlund wrote: On Sat, 28 Dec 2013 11:18:44 -0500 Tom Parker tpar...@cbnco.com wrote: Hello /usr/sbin/lrmadmin is missing from the latest version of cluster-glue in SLES SP3. Has the program been

[Linux-HA] /usr/sbin/lrmadmin missing from cluster-glue

2013-12-28 Thread Tom Parker
Hello /usr/sbin/lrmadmin is missing from the latest version of cluster-glue in SLES SP3. Has the program been deprecated or is this an issue in the packaging of the RPM? Thanks Tom ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org

Re: [Linux-HA] Antw: Xen XL Resource Agent

2013-11-18 Thread Tom Parker
: Xen XL Resource Agent On 2013-11-15T09:05:53, Tom Parker tpar...@cbnco.com wrote: The XL tools are much faster and lighter weight. I am not sure if they report proper codes (I will have to test) but the XM stack has been deprecated so at some point I assume it will go away completely

Re: [Linux-HA] Antw: Xen XL Resource Agent

2013-11-15 Thread Tom Parker
0m0.236s sys 0m0.036s On 11/15/2013 02:04 AM, Ulrich Windl wrote: Tom Parker tpar...@cbnco.com schrieb am 14.11.2013 um 19:23 in Nachricht 5285150b.9050...@cbnco.com: Hello. Now that XM has been deprecated is anyone working on a Xen RA that uses the xl tool stack? I woonder whether

[Linux-HA] Xen XL Resource Agent

2013-11-14 Thread Tom Parker
Hello. Now that XM has been deprecated is anyone working on a Xen RA that uses the xl tool stack? I am willing to do the work but I don't want to duplicate the effort if someone else is doing/has already done it. Tom ___ Linux-HA mailing list

Re: [Linux-HA] How many primitives, groups can I have

2013-11-11 Thread Tom Parker
You will also have to be careful of the shared memory size between the nodes. I had issues with massive cibs. Setting some environment variables fixed the issue but the defaults are too small. From: Digimer Sent: Monday, November 11, 2013 10:24 AM To: General Linux-HA mailing list Reply To:

Re: [Linux-HA] How many primitives, groups can I have

2013-11-11 Thread Tom Parker
$PACEMAKER_SYSCONFIG ]; then . $PACEMAKER_SYSCONFIG fi Hope this helps. On 11/11/2013 03:35 PM, Tom Parker wrote: You will also have to be careful of the shared memory size between the nodes. I had issues with massive cibs. Setting some environment variables fixed the issue

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-21 Thread Tom Parker
Thanks for the feedback. Dejan, I have some SLES nodes that are running around 30 pretty heavy VMs and I found that while I never go to 5s that the time it would take to reboot was not a constant. I have a feeling that this bug in xen-list may take a while to be fixed upstream and trickle down

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-21 Thread Tom Parker
always welcome, of course. It should also go in a separate commit. Thanks, Dejan Regards, Ulrich Tom Parker tpar...@cbnco.com schrieb am 18.10.2013 um 19:30 in Nachricht 5261703a.5070...@cbnco.com: Hi Dejan. Sorry to be slow to respond to this. I have done some testing and everything

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-18 Thread Tom Parker
, On Wed, Oct 16, 2013 at 05:28:28PM -0400, Tom Parker wrote: Some more reading of the source code makes me think the || [ $__OCF_ACTION != stop ]; is not needed. Yes, you're right. I'll drop that part of the if statement. Many thanks for testing. Fixed now. The if statement, which was obviously

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-18 Thread Tom Parker
I may have actually created the pull request properly... Please let me know and again thanks for your help. Tom On 10/18/2013 01:30 PM, Tom Parker wrote: Hi Dejan. Sorry to be slow to respond to this. I have done some testing and everything looks good. I spent some time tweaking the RA

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-16 Thread Tom Parker
-1)) done return $rc } On 10/16/2013 12:12 PM, Dejan Muhamedagic wrote: Hi Tom, On Tue, Oct 15, 2013 at 07:55:11PM -0400, Tom Parker wrote: Hi Dejan Just a quick question. I cannot see your new log messages being logged to syslog ocf_log warn domain $1 reported as not running

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-16 Thread Tom Parker
05:16 PM, Tom Parker wrote: Hi. I think there is an issue with the Updated Xen RA. I think there is an issue with the if statement here but I am not sure. I may be confused about how bash || works but I don't see my servers ever entering the loop on a vm disappearing. if ocf_is_probe

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-15 Thread Tom Parker
Marowsky-Bree wrote: On 2013-10-01T00:53:15, Tom Parker tpar...@cbnco.com wrote: Thanks for paying attention to this issue (not really a bug) as I am sure I am not the only one with this issue. For now I have set all my VMs to destroy so that the cluster is the only thing managing them

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-10 Thread Tom Parker
: On 2013-10-01T00:53:15, Tom Parker tpar...@cbnco.com wrote: Thanks for paying attention to this issue (not really a bug) as I am sure I am not the only one with this issue. For now I have set all my VMs to destroy so that the cluster is the only thing managing them but this is not super clean

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-10 Thread Tom Parker
:02PM +0200, Lars Marowsky-Bree wrote: On 2013-10-01T00:53:15, Tom Parker tpar...@cbnco.com wrote: Thanks for paying attention to this issue (not really a bug) as I am sure I am not the only one with this issue. For now I have set all my VMs to destroy so that the cluster is the only thing

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-10-02 Thread Tom Parker
. Thanks again! Tom On 10/01/2013 06:24 AM, Dejan Muhamedagic wrote: Hi, On Tue, Oct 01, 2013 at 12:13:02PM +0200, Lars Marowsky-Bree wrote: On 2013-10-01T00:53:15, Tom Parker tpar...@cbnco.com wrote: Thanks for paying attention to this issue (not really a bug) as I am sure I am not the only one

Re: [Linux-HA] Antw: Re: Xen RA and rebooting

2013-09-30 Thread Tom Parker
Hi Ulrich. You have summed it up exactly and the chances seem small but in the real world (Murphy's Law I guess) I have hit this many times. Twice to the point where I have mangled a Production VM to the point of garbage. The larger the available free memory on the cluster as a whole seems to

Re: [Linux-HA] Xen RA and rebooting

2013-09-17 Thread Tom Parker
On 09/17/2013 01:13 AM, Vladislav Bogdanov wrote: 14.09.2013 07:28, Tom Parker wrote: Hello All Does anyone know of a good way to prevent pacemaker from declaring a vm dead if it's rebooted from inside the vm. It seems to be detecting the vm as stopped for the brief moment between shutting

Re: [Linux-HA] Xen RA and rebooting

2013-09-17 Thread Tom Parker
On 09/17/2013 04:18 AM, Lars Marowsky-Bree wrote: On 2013-09-16T16:36:38, Tom Parker tpar...@cbnco.com wrote: Can you kindly file a bug report here so it doesn't get lost https://github.com/ClusterLabs/resource-agents/issues ? Submitted (Issue *#308)* Thanks. It definitely leads to data

Re: [Linux-HA] Clone colocation missing?

2013-09-16 Thread Tom Parker
-with-storage inf: CBNXen storage-clone This should take care of all the ordering and colocation needs for my VMs Tom On 09/14/2013 07:14 AM, Lars Marowsky-Bree wrote: On 2013-09-13T17:48:40, Tom Parker tpar...@cbnco.com wrote: Hi Feri I agree that it should be necessary but for some reason

Re: [Linux-HA] Xen RA and rebooting

2013-09-16 Thread Tom Parker
On 09/14/2013 07:18 AM, Lars Marowsky-Bree wrote: On 2013-09-14T00:28:30, Tom Parker tpar...@cbnco.com wrote: Does anyone know of a good way to prevent pacemaker from declaring a vm dead if it's rebooted from inside the vm. It seems to be detecting the vm as stopped for the brief moment

Re: [Linux-HA] Clone colocation missing? (was: Pacemaker 1.19 cannot manage more than 127 resources)

2013-09-13 Thread Tom Parker
of a primitive. Tom On Thu 05 Sep 2013 04:48:40 AM EDT, Ferenc Wagner wrote: Tom Parker tpar...@cbnco.com writes: I have attached my original crm config with 201 primitives to this e-mail. Hi, Sorry to sidetrack this thread, but I really wonder why you only have order constraints for your Xen

[Linux-HA] Xen RA and rebooting

2013-09-13 Thread Tom Parker
Hello All Does anyone know of a good way to prevent pacemaker from declaring a vm dead if it's rebooted from inside the vm. It seems to be detecting the vm as stopped for the brief moment between shutting down and starting up. Often this causes the cluster to have two copies of the same vm if

[Linux-HA] error: te_connect_stonith: Sign-in failed: triggered a retry

2013-08-29 Thread Tom Parker
: cluster-glue-1.0.11-0.15.28 libcorosync4-1.4.5-0.18.15 corosync-1.4.5-0.18.15 pacemaker-mgmt-2.1.2-0.7.40 pacemaker-mgmt-client-2.1.2-0.7.40 pacemaker-1.1.9-0.19.102 Does anyone know where this may be coming from? Thanks Tom Parker. ___ Linux-HA

Re: [Linux-HA] error: te_connect_stonith: Sign-in failed: triggered a retry

2013-08-29 Thread Tom Parker
, at 5:51 AM, Tom Parker tpar...@cbnco.com wrote: Hello Since my upgrade last night I am also seeing this message in the logs on my servers. error: te_connect_stonith: Sign-in failed: triggered a retry Old mailing lists seem to imply that this is an issue with heartbeat which I don't think I

Re: [Linux-HA] Pacemaker 1.19 cannot manage more than 127 resources

2013-08-29 Thread Tom Parker
/2013 11:19 PM, Andrew Beekhof wrote: On 30/08/2013, at 5:49 AM, Tom Parker tpar...@cbnco.com wrote: Hello. Las night I updated my SLES 11 servers to HAE-SP3 which contains the following versions of software: cluster-glue-1.0.11-0.15.28 libcorosync4-1.4.5-0.18.15 corosync-1.4.5-0.18.15

Re: [Linux-HA] Pacemaker 1.19 cannot manage more than 127 resources

2013-08-29 Thread Tom Parker
Do you know if this has changed significantly from the older versions? This cluster was working fine before the upgrade. On Fri 30 Aug 2013 12:16:35 AM EDT, Andrew Beekhof wrote: On 30/08/2013, at 1:42 PM, Tom Parker tpar...@cbnco.com wrote: My pacemaker config contains the following

Re: [Linux-HA] Pacemaker 1.19 cannot manage more than 127 resources

2013-08-29 Thread Tom Parker
. On 30/08/2013, at 2:21 PM, Tom Parker tpar...@cbnco.com wrote: Do you know if this has changed significantly from the older versions? This cluster was working fine before the upgrade. On Fri 30 Aug 2013 12:16:35 AM EDT, Andrew Beekhof wrote: On 30/08/2013, at 1:42 PM, Tom Parker tpar