Re: [ClusterLabs] Cluster monitoring

2015-10-21 Thread Ken Gaillot
On 10/21/2015 08:24 AM, Michael Schwartzkopff wrote:
> Am Mittwoch, 21. Oktober 2015, 18:50:15 schrieb Arjun Pandey:
>> Hi folks
>> 
>> I had a question on monitoring of cluster events. Based on the 
>> documentation it seems that cluster monitor is the only method
>> of monitoring the cluster events. Also since it seems to poll
>> based on the interval configured it might miss some events. Is
>> that the case ?
> 
> No. the cluser is event-based. So it won't miss any event. If you
> use the cluster's tools, they see hte events. If you monitor the
> events you won't miss any either.

FYI, Pacemaker 1.1.14 will have built-in handling of notification
scripts, without needing a ClusterMon resource. These will be
event-driven. Andrew Beekhof did a recent blog post about it:
http://blog.clusterlabs.org/blog/2015/reliable-notifications/

Pacemaker's monitors are polling, at the interval specified when
configuring the monitor operation. Pacemaker relies on the resource
agent to return status for monitors, so technically it's up to the
resource agent whether it can "miss" brief outages that occur between
polls. All the ones I've looked at would miss them, but generally
that's considered acceptable if the service is once again fully
working when the monitor runs (because it implies it recovered itself).

Some people use an external monitoring system (nagios, icinga, zabbix,
etc.) in addition to Pacemaker's monitors. They can complement each
other, as the external system can check system parameters outside
Pacemaker's view and can alert administrators for some early warning
signs before a resource gets to the point of needing recovery. Of
course such monitoring systems are also polling at configured intervals.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] difference between OCF return codes for monitor action

2015-10-21 Thread Vallevand, Mark K
In my resource agent, I use OCF_NOT_RUNNING to indicate that the managed 
resource is literally ‘not running’.  I use OCF_ERR_GENERIC to indicate that 
the managed resource is running, but has entered an error state.  It is 
potentially recoverable if it is restarted.  I use OCF_ERR_PERM to indicate the 
managed resource has entered an error state which would not be recoverable if 
it is restarted.  The resource usually restarts on another node.

Regards.
Mark K Vallevand   mark.vallev...@unisys.com
Never try and teach a pig to sing: it's a waste of time, and it annoys the pig.
THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is thus for use only by the intended recipient. If you received 
this in error, please contact the sender and delete the e-mail and its 
attachments from all computers.
From: Kostiantyn Ponomarenko [mailto:konstantin.ponomare...@gmail.com]
Sent: Wednesday, October 21, 2015 07:44 AM
To: Cluster Labs - All topics related to open-source clustering welcomed
Subject: [ClusterLabs] difference between OCF return codes for monitor action

Hi,

What is the difference between "OCF_ERR_GENERIC" and "OCF_NOT_RUNNING" return 
codes in "monitor" action from the Pacemaker's point of view?

I was looking here 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html
 , but I still don't see the difference clearly.

Thank you,
Kostya
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] difference between OCF return codes for monitor action

2015-10-21 Thread Ken Gaillot
On 10/21/2015 07:44 AM, Kostiantyn Ponomarenko wrote:
> Hi,
> 
> What is the difference between "OCF_ERR_GENERIC" and "OCF_NOT_RUNNING"
> return codes in "monitor" action from the Pacemaker's point of view?
> 
> I was looking here
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html
> , but I still don't see the difference clearly.
> 
> Thank you,
> Kostya

OCF_ERR_GENERIC is a "soft" error, so if any operation returns that,
Pacemaker will try to recover the resource by restarting it or moving it
to a new location.

OCF_NOT_RUNNING is a state (not necessarily an error). When first
placing a resource, Pacemaker will (by default) run monitors for it on
all hosts, to make sure it's not already running somewhere. So in that
case (which is usually where you see this), it's not an error, but a
confirmation of the expected state. On the other hand, if Pacemaker gets
this when a resource is expected to be up, it will consider it an error
and try to recover. The only difference in that case is that Pacemaker
will not try to stop the resource because it's already stopped.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Cluster monitoring

2015-10-21 Thread Arjun Pandey
Hi folks

I had a question on monitoring of cluster events. Based on the
documentation it seems that cluster monitor is the only method of
monitoring the cluster events. Also since it seems to poll based on the
interval configured it might miss some events. Is that the case ?

Is their any other alternative available ?
As of now i'm only looking at Cluster Monitor which will be configured with
an external program and the interval as a part of resource configuration.


Regards
Arjun
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] about the configuration for a iSCSITarget resource using the crm(8) shell.

2015-10-21 Thread Dejan Muhamedagic
Hi,

On Wed, Oct 21, 2015 at 12:17:15PM +, Shilu wrote:
> Hi,everyone!
> The following is an example configuration for a iSCSITarget resource using 
> the crm(8) shell:
> primitive tgt ocf:heartbeat:iSCSITarget \
>   params implementation="tgt" iqn="foo" tid="1" \
>   op monitor interval="1s"

This interval is very small. Are you sure that you want to
monitor it so often?

> now i want to use the param additional_parameters,who can tell me how to use 
> this param?
> The following is how I use it,but it is not correct.
> 
> primitive tgt ocf:heartbeat:iSCSITarget \
>   params implementation="tgt" iqn="foo" tid="1" additional_parameters="lun=1 
> bs-type=rbd backing-store=rbd/foo" \
>   op monitor interval="1s"

What exactly is not correct? Not an iscsi target expert here, but
syntactically it seems to be OK. You can also take look at the
XML definition of the resource:

$ crm configure show xml tgt

Thanks,

Dejan

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Cluster monitoring

2015-10-21 Thread Michael Schwartzkopff
Am Mittwoch, 21. Oktober 2015, 18:50:15 schrieb Arjun Pandey:
> Hi folks
> 
> I had a question on monitoring of cluster events. Based on the
> documentation it seems that cluster monitor is the only method of
> monitoring the cluster events. Also since it seems to poll based on the
> interval configured it might miss some events. Is that the case ?

No. the cluser is event-based. So it won't miss any event. If you use the 
cluster's tools, they see hte events. If you monitor the events you won't miss 
any either.
 

Mit freundlichen Grüßen,

Michael Schwartzkopff

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

signature.asc
Description: This is a digitally signed message part.
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] [ANNOUNCE] [HA] [Pacemaker] new, maintained openstack-resource-agents repository

2015-10-21 Thread Adam Spiers
[cross-posting to openstack-dev and pacemaker user lists; please
consider trimming the recipients list if your reply is not relevant to
both communities]

Hi all,

Back in June I proposed moving the well-used but no longer maintained
https://github.com/madkiss/openstack-resource-agents/ repository to
Stackforge:

  http://lists.openstack.org/pipermail/openstack-dev/2015-June/067763.html
  https://github.com/madkiss/openstack-resource-agents/issues/22

The responses I got were more or less unanimously in favour, so I'm
simultaneously pleased and slightly embarrassed to announce that 4
months later, I've finally followed up on my proposal:

  https://launchpad.net/openstack-resource-agents
  https://git.openstack.org/cgit/openstack/openstack-resource-agents/
  
https://review.openstack.org/#/admin/projects/openstack/openstack-resource-agents
  
https://review.openstack.org/#/q/status:open+project:openstack/openstack-resource-agents,n,z

Since June, Stackforge has been retired, so as you can see above, this
repository lives under the 'openstack' namespace.

I volunteered to be a maintainer and there were no objections.  I sent
out an initial call for co-maintainers but noone expressed an interest
which is probably fine because the workload is likely to be quite
light.  However if you'd like to be involved please drop me a line.

I've also taken care of outstanding pull requests and bug reports
against the old repository, and providing a redirect from the old
repository's README to the new one.

Still TODO: adding this repository to the Big Tent.  I've had some
discussions with the openstack-infra team about that, since there is
not currently a suitable project team to create it under.  We might
create a new project team called "OpenStack Pacemaker" or similar, and
place it under that.  ("OpenStack HA" would be far too broad to be
able to find a single PTL.)  However there is no rush for this, and it
has been suggested that it would not be a bad thing to wait for the
"new" project to stabilise and prove its longevity before making it
official.

Cheers,
Adam

P.S. I'll be in Tokyo if anyone wants to meet there and discuss
further.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Cluster node loss detection.

2015-10-21 Thread Jan Pokorný
On 16/10/15 12:51 -0400, Digimer wrote:
> On 16/10/15 12:37 PM, Vallevand, Mark K wrote:
>> So, it looks like setting the corosync parameters in cluster.conf
>> has some effect.  Cman seems to pass them to corosync.
> 
> Yes, never configure corosync directly when using cman, only use
> cluster.conf, as you did.

Yep, this is the reason ccs2pcs* conversions in clufter never ask
for input corosync.conf (nor pcs2pcscmd-flatiron do).  It's use is
inhibited when corosync is under cman's governance.

-- 
Jan (Poki)


pgpjwSkaWNtbk.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] difference between OCF return codes for monitor action

2015-10-21 Thread Kostiantyn Ponomarenko
Hi,

What is the difference between "OCF_ERR_GENERIC" and "OCF_NOT_RUNNING"
return codes in "monitor" action from the Pacemaker's point of view?

I was looking here
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html
, but I still don't see the difference clearly.

Thank you,
Kostya
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] difference between OCF return codes for monitor action

2015-10-21 Thread Michael Schwartzkopff
Am Mittwoch, 21. Oktober 2015, 15:44:17 schrieb Kostiantyn Ponomarenko:
> Hi,
> 
> What is the difference between "OCF_ERR_GENERIC" and "OCF_NOT_RUNNING"
> return codes in "monitor" action from the Pacemaker's point of view?
> 
> I was looking here
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-oc
> f-return-codes.html , but I still don't see the difference clearly.
> 
> Thank you,
> Kostya

No differences from pacemaker point of view. Both are errors and pacemaker acts 
as configured. The return codes are for the admin to make it easier to find the 
cause.

Mit freundlichen Grüßen,

Michael Schwartzkopff

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

signature.asc
Description: This is a digitally signed message part.
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org