Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts

2016-04-25 Thread Klaus Wenninger
On 04/25/2016 08:03 AM, Kristoffer Grönlund wrote:
> Ken Gaillot  writes:
>
>> Hello everybody,
>>
>> The release cycle for 1.1.15 will be started soon (hopefully tomorrow)!
>>
>> The most prominent feature will be Klaus Wenninger's new implementation
>> of event-driven alerts -- the ability to call scripts whenever
>> interesting events occur (nodes joining/leaving, resources
>> starting/stopping, etc.).
> Hi, and happy to see this! Looks like a potentially very useful feature.
>
> I started experimenting with support for alerts in crm, and have some
> (very minor) nits/comments.
>
>> The meta-attributes are optional properties used by the cluster.
>> Currently, they include "timeout" (which defaults to 30s) and
>> "tstamp_format" (which defaults to "%H:%M:%S.%06N", and is a
>> microsecond-resolution timestamp provided to the alert script as the
>> CRM_alert_timestamp environment variable).
> Is "tstamp_format" correct? All the other meta attributes are
> in-this-format, so "tstamp-format" would be preferrable to
> maintain consistency. Personally, I'd prefer "timestamp-format", but
> that's veering into bikeshed territory...
You have a point here. tstamp_format was there before the
insight that there were a couple of attributes that belonged
to kind of the same family as those grouped as
meta-attributes when we look at resources.
Probably still early enough to change it.
I would as well prefer timestamp-format as the correlation
with the variable CRM_alert_timestamp seems more
natral then.

>> In the current implementation, meta-attributes and instance attributes
>> may also be specified within the  block, in which case they
>> override any values specified in the  block when sent to that
>> recipient. Whether this stays in the final 1.1.15 release or not depends
>> on whether people find this to be useful, or confusing.
> Do you have any current use for this? My immediate thought is that
> allowing rule expressions in the  level meta and instance
> attributes would be both more expressive and less confusing.
Do you refer to the global idea of repeated recipient-sections here or
just to the overwriting of instance/meta-attributes of the alert-section
by those in the recipient-section?

A guy on the list was complaining that it was called recipient & value
reading the example logging to a log-file. So an instance-attribute called
logfile could be an example.
Certain recipients (whatever a recipient might be ...) might react
quicker and others might be more lame so a timeout per recipient
might make sense.
In cases of recipients being email-destination-addresses it might
be interesting to be able to as well specify a sender-address or
an smtp-server to use.
Could you give examples for how you would like to use rule-expressions -
especially if you want to replace the recipient-sections...

> Cheers,
> Kristoffer
>


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts

2016-04-25 Thread Kristoffer Grönlund
Klaus Wenninger  writes:

> On 04/25/2016 08:03 AM, Kristoffer Grönlund wrote:
>>> In the current implementation, meta-attributes and instance attributes
>>> may also be specified within the  block, in which case they
>>> override any values specified in the  block when sent to that
>>> recipient. Whether this stays in the final 1.1.15 release or not depends
>>> on whether people find this to be useful, or confusing.
>> Do you have any current use for this? My immediate thought is that
>> allowing rule expressions in the  level meta and instance
>> attributes would be both more expressive and less confusing.
> Do you refer to the global idea of repeated recipient-sections here or
> just to the overwriting of instance/meta-attributes of the alert-section
> by those in the recipient-section?
>

The second, overwriting instance/meta-attributes by those in the
recipient-section.

> A guy on the list was complaining that it was called recipient & value
> reading the example logging to a log-file. So an instance-attribute called
> logfile could be an example.
> Certain recipients (whatever a recipient might be ...) might react
> quicker and others might be more lame so a timeout per recipient
> might make sense.
> In cases of recipients being email-destination-addresses it might
> be interesting to be able to as well specify a sender-address or
> an smtp-server to use.
> Could you give examples for how you would like to use rule-expressions -
> especially if you want to replace the recipient-sections...

I haven't thought through the implications completely, but my thought is
that for primitives, for example, you would create multiple
instance-attribute entries with rule expressions that determine which
value is applied under which conditions (so, on this node set FOO to
this value, on that node set FOO to that value, etc.).

First of all I would ask if rule expressions already are permitted in
instance-attribute tags in the alert tag? If so, then making it possible
to create rule expressions that check against the recipient would make
sense as well as remove the need to allow overrides in each recipient
tag.

But I don't have any concrete use case either way, I am only looking at
this from a consistency point of view.

>
>> Cheers,
>> Kristoffer
>>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Fw: Moving Related Servers

2016-04-25 Thread ‪H Yavari‬ ‪
Hi,


Thanks for your offer. I checked this and this is a amazing solution.So I 
defined two cluster :
testcluster1:App1App2resource : IP float
testcluster2:App3App4
resource : tomcat
I know that we need to grant a ticket and manage that with Booth. But I 
couldn't understand how should I define a ticket and relation of nodes and 
clusters with the ticket. I read the mentioned doc, but I missed up. Can you 
give me one example?

Thanks so.




  From: Ken Gaillot 
 
  
On 04/20/2016 12:44 AM, ‪H Yavari‬ ‪ wrote:
> You got my situation right. But I couldn't find any method to do this?
> 
> I should create one cluster with 4 node or 2 cluster with 2 node ? How I
> restrict the cluster nodes to each other!!?

Your last questions made me think of multi-site clustering using booth.
I think this might be the best solution for you.

You can configure two independent pacemaker clusters of 2 nodes each,
then use booth to ensure that one cluster has the resources at any time.
See:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140617279413776

This is usually done with clusters at physically separate locations, but
there's no problem with using it with two clusters in one location.

Alternatively, going along more traditional lines such as what Klaus and
I have mentioned, you could use rules and node attributes to keep the
resources where desired. You could write a custom resource agent that
would set a custom node attribute for the matching node (the start
action should set the attribute to 1, and the stop action should set the
attribute to 0; if the resource was on App 1, you'd set the attribute
for App 3, and if the resource was on App 4, you'd set the attribute for
App 4). Colocate that resource with your floating IP, and use a rule to
locate service X where the custom node attribute is 1. See:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140617279376656

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140617356537136

> 
> 
> *From:* Klaus Wenninger 
> *To:* users@clusterlabs.org
> *Sent:* Wednesday, 20 April 2016, 9:56:05
> *Subject:* Re: [ClusterLabs] Moving Related Servers
> 
> On 04/19/2016 04:32 PM, Ken Gaillot wrote:
>> On 04/18/2016 10:05 PM, ‪H Yavari‬ ‪ wrote:
>>> Hi,
>>>
>>> This is servers maps:
>>>
>>> App 3-> App 1    (Active)
>>>
>>> App 4 -> App 2  (Standby)
>>>
>>>
>>> Now App1 and App2 are in a cluster with IP failover.
>>>
>>> I need when IP failover will run and App2 will be Active node, service
>>> "X" on server App3 will be stop and App 4 will be Active node.
>>> In the other words, App1 works only with App3 and App 2 works with App 4.
>>>
>>> I have a web application on App1 and some services on App 3 (this is
>>> same for App2 and App 4)
>> This is a difficult situation to model. In particular, you could only
>> have a dependency one way -- so if we could get App 3 to fail over if
>> App 1 fails, we couldn't model the other direction (App 1 failing over
>> if App 3 fails). If each is dependent on the other, there's no way to
>> start one first.
>>
>> Is there a technical reason App 3 can work only with App 1?
>>
>> Is it possible for service "X" to stay running on both App 3 and App 4
>> all the time? If so, this becomes easier.
> Just another try to understand what you are aiming for:
> 
> You have a 2-node-cluster at the moment consisting of the nodes
> App1 & App2.
> You configured something like a master/slave-group to realize
> an active/standby scenario.
> 
> To get the servers App3 & App4 into the game we would make
> them additional pacemaker-nodes (App3 & App4).
> You now have a service X that could be running either on App3 or
> App4 (which is easy by e.g. making it dependent on a node attribute)
> and it should be running on App3 when the service-group is active
> (master in pacemaker terms) on App1 and on App4 when the
> service-group is active on App2.
> 
> The standard thing would be to collocate a service with the master-role
> (see all the DRBD examples for instance).
> We would now need a locate-x when master is located-y rule instead
> of collocation.
> I don't know any way to directly specify this.
> One - ugly though - way around I could imagine would be:
> 
> - locate service X1 on App3
> - locate service X2 on App4
> - dummy service Y1 is located App1 and collocated with master-role
> - dummy service Y2 is located App2 and collocated with master-role
> - service X1 depends on Y1
> - service X2 depends on Y2
> 
> If that somehow reflects your situation the key question now would
> probably be if pengine would make the group on App2 master
> if service X1 fails on App3. I would guess yes but I'm not sure.
> 
> Regards,
> Klaus
>

Re: [ClusterLabs] Performance of a mirrored LV (cLVM) with OCFS: Attempt to monitor it

2016-04-25 Thread Lars Marowsky-Bree
On 2016-04-25T10:10:38, Ulrich Windl  wrote:

Hi Ulrich,

I can't really comment on why the cLVM2 is slow (somewhat surprisingly,
because flock is meta-data only and thus shouldn't even be affected by
cLVM2, anyway ...).

But on the subject of performance, you're quite right - we know that
cLVM2 is not fast enough, thus there has been an effort to make md raid
cluster aware (especially RAID1). cluster-md is almost completely
merged upstream and coming to your favorite enterprise distribution very
soon too ;-)


Regards,
Lars

-- 
Architect SDS, Distinguished Engineer
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: Re: Performance of a mirrored LV (cLVM) with OCFS: Attempt to monitor it

2016-04-25 Thread Ulrich Windl
>>> Lars Marowsky-Bree  schrieb am 25.04.2016 um 12:12 in
Nachricht
<20160425101236.gd10...@suse.de>:
> On 2016-04-25T10:10:38, Ulrich Windl 
wrote:
> 
> Hi Ulrich,
> 
> I can't really comment on why the cLVM2 is slow (somewhat surprisingly,
> because flock is meta-data only and thus shouldn't even be affected by
> cLVM2, anyway ...).
> 
> But on the subject of performance, you're quite right - we know that
> cLVM2 is not fast enough, thus there has been an effort to make md raid
> cluster aware (especially RAID1). cluster-md is almost completely
> merged upstream and coming to your favorite enterprise distribution very
> soon too ;-)

Lars,

that's good news.

As we've made good experience with MD-RAID, I really thought about having an
MD-RAID on one node and export that RAID via iSCSI to all the nodes that need
access. Unfortunately I cannot compare performance ahead of time 8-(

Regards,
Ulrich

> 
> 
> Regards,
> Lars
> 
> -- 
> Architect SDS, Distinguished Engineer
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Performance of a mirrored LV (cLVM) with OCFS: Attempt to monitor it

2016-04-25 Thread Lars Marowsky-Bree
On 2016-04-25T12:40:31, Ulrich Windl  wrote:

> As we've made good experience with MD-RAID, I really thought about having an
> MD-RAID on one node and export that RAID via iSCSI to all the nodes that need
> access. Unfortunately I cannot compare performance ahead of time 8-(

The additional IO hop would pretty badly hurt. Not so much on bandwidth
(if your NICs are capable of handling the throughput), but latency adds
up.


-- 
Architect SDS, Distinguished Engineer
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts

2016-04-25 Thread Lars Marowsky-Bree
On 2016-04-21T12:50:43, Ken Gaillot  wrote:

Hi all,

awesome to see such a cool new feature land! I do have some
questions/feedback though.

> The alerts section can have any number of alerts, which look like:
> 
>   path="/srv/pacemaker/pcmk_alert_sample.sh">
> 
> value="/var/log/cluster-alerts.log" />
> 
>

So, there's one bit of this I dislike - instance_attributes get passed
via the environment (as always), but the "value" ends up on the
command-line in ARGV[]? Why?

Wouldn't it make more sense to have an alert-wide instance_attribute
section within , that could be overridden on a per-recipient
basis if needed? And drop the value entirely?

Having things in ARGV[] is always risky due to them being exposed more
easily via ps. Environment variables or stdin appear better.


What I also miss is the ability to filter the events (at least
coarsely?) sent to a specific alert/recipient, and to constraint on
which nodes it will get executed.  Is that going to happen? On a busy
cluster, this could easily cause significant load otherwise.


It's also worth pointing out that this could likely "lose" events during
fail-overs, DC crashes, etc. Users probably should not strictly rely on
seeing *every* alert in their scripts, so this should be carefully
documented to not be considered a transactional, reliable message bus.


Regards,
Lars

-- 
Architect SDS
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts

2016-04-25 Thread Lars Ellenberg
On Thu, Apr 21, 2016 at 12:50:43PM -0500, Ken Gaillot wrote:
> Hello everybody,
> 
> The release cycle for 1.1.15 will be started soon (hopefully tomorrow)!
> 
> The most prominent feature will be Klaus Wenninger's new implementation
> of event-driven alerts -- the ability to call scripts whenever
> interesting events occur (nodes joining/leaving, resources
> starting/stopping, etc.).

What exactly is "etc." here?
What is the comprehensive list
of which "events" will trigger "alerts"?

My guess would be
 DC election/change
   which does not necessarily imply membership change
 change in membership
   which includes change in quorum
 fencing events
   (even failed fencing?)
 resource start/stop/promote/demote
  (probably) monitor failure?
   maybe only if some fail-count changes to/from infinity?
   or above a certain threshold?

 change of maintenance-mode?
 node standby/online (maybe)?
 maybe "resource cannot be run anywhere"?

would it be useful to pass in the "transaction ID"
or other pointer to the recorded cib input at the time
the "alert" was triggered?

can an alert "observer" (alert script) "register"
for only a subset of the "alerts"?

if so, can this filter be per alert script,
or per "recipient", or both?

Thanks,

Lars


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Monitoring action of Pacemaker resources fail because of high load on the nodes

2016-04-25 Thread Dmitri Maziuk

On 2016-04-24 16:20, Ken Gaillot wrote:


Correct, you would need to customize the RA.


Well, you wouldn't because your custom RA will be overwritten by the 
next RPM update.


Dimitri



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Moving Related Servers

2016-04-25 Thread ‪H Yavari‬ ‪


Hi,


Thanks for your offer. I checked this and this is a amazing solution.So I 
defined two cluster :
testcluster1:App1App2resource : IP float
testcluster2:App3App4
resource : tomcat
I know that we need to grant a ticket and manage that with Booth. But I 
couldn't understand how should I define a ticket and relation of nodes and 
clusters with the ticket. I read the mentioned doc, but I missed up. Can you 
give me one example?

Thanks so.




  From: Ken Gaillot 
 
  
On 04/20/2016 12:44 AM, ‪H Yavari‬ ‪ wrote:
> You got my situation right. But I couldn't find any method to do this?
> 
> I should create one cluster with 4 node or 2 cluster with 2 node ? How I
> restrict the cluster nodes to each other!!?

Your last questions made me think of multi-site clustering using booth.
I think this might be the best solution for you.

You can configure two independent pacemaker clusters of 2 nodes each,
then use booth to ensure that one cluster has the resources at any time.
See:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140617279413776

This is usually done with clusters at physically separate locations, but
there's no problem with using it with two clusters in one location.

Alternatively, going along more traditional lines such as what Klaus and
I have mentioned, you could use rules and node attributes to keep the
resources where desired. You could write a custom resource agent that
would set a custom node attribute for the matching node (the start
action should set the attribute to 1, and the stop action should set the
attribute to 0; if the resource was on App 1, you'd set the attribute
for App 3, and if the resource was on App 4, you'd set the attribute for
App 4). Colocate that resource with your floating IP, and use a rule to
locate service X where the custom node attribute is 1. See:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140617279376656

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140617356537136

> 
> 
> *From:* Klaus Wenninger 
> *To:* users@clusterlabs.org
> *Sent:* Wednesday, 20 April 2016, 9:56:05
> *Subject:* Re: [ClusterLabs] Moving Related Servers
> 
> On 04/19/2016 04:32 PM, Ken Gaillot wrote:
>> On 04/18/2016 10:05 PM, ‪H Yavari‬ ‪ wrote:
>>> Hi,
>>>
>>> This is servers maps:
>>>
>>> App 3-> App 1    (Active)
>>>
>>> App 4 -> App 2  (Standby)
>>>
>>>
>>> Now App1 and App2 are in a cluster with IP failover.
>>>
>>> I need when IP failover will run and App2 will be Active node, service
>>> "X" on server App3 will be stop and App 4 will be Active node.
>>> In the other words, App1 works only with App3 and App 2 works with App 4.
>>>
>>> I have a web application on App1 and some services on App 3 (this is
>>> same for App2 and App 4)
>> This is a difficult situation to model. In particular, you could only
>> have a dependency one way -- so if we could get App 3 to fail over if
>> App 1 fails, we couldn't model the other direction (App 1 failing over
>> if App 3 fails). If each is dependent on the other, there's no way to
>> start one first.
>>
>> Is there a technical reason App 3 can work only with App 1?
>>
>> Is it possible for service "X" to stay running on both App 3 and App 4
>> all the time? If so, this becomes easier.
> Just another try to understand what you are aiming for:
> 
> You have a 2-node-cluster at the moment consisting of the nodes
> App1 & App2.
> You configured something like a master/slave-group to realize
> an active/standby scenario.
> 
> To get the servers App3 & App4 into the game we would make
> them additional pacemaker-nodes (App3 & App4).
> You now have a service X that could be running either on App3 or
> App4 (which is easy by e.g. making it dependent on a node attribute)
> and it should be running on App3 when the service-group is active
> (master in pacemaker terms) on App1 and on App4 when the
> service-group is active on App2.
> 
> The standard thing would be to collocate a service with the master-role
> (see all the DRBD examples for instance).
> We would now need a locate-x when master is located-y rule instead
> of collocation.
> I don't know any way to directly specify this.
> One - ugly though - way around I could imagine would be:
> 
> - locate service X1 on App3
> - locate service X2 on App4
> - dummy service Y1 is located App1 and collocated with master-role
> - dummy service Y2 is located App2 and collocated with master-role
> - service X1 depends on Y1
> - service X2 depends on Y2
> 
> If that somehow reflects your situation the key question now would
> probably be if pengine would make the group on App2 master
> if service X1 fails on App3. I would guess yes but I'm not sure.
> 
> Regards,
> Klaus

Re: [ClusterLabs] Monitoring action of Pacemaker resources fail because of high load on the nodes

2016-04-25 Thread Ken Gaillot
On 04/25/2016 10:23 AM, Dmitri Maziuk wrote:
> On 2016-04-24 16:20, Ken Gaillot wrote:
> 
>> Correct, you would need to customize the RA.
> 
> Well, you wouldn't because your custom RA will be overwritten by the
> next RPM update.

Correct again :)

I should have mentioned that the convention is to copy the script to a
different name before editing it. The recommended approach is to create
a new provider for your organization. For example, copy the RA to a new
directory /usr/lib/ocf/resource.d/local, so it would be used in
pacemaker as ocf:local:mysql. You can use anything in place of "local".

> Dimitri
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] operation parallelism

2016-04-25 Thread Ken Gaillot
On 04/22/2016 09:05 AM, Ferenc Wágner wrote:
> Hi,
> 
> Are recurring monitor operations constrained by the batch-limit cluster
> option?  I ask because I'd like to limit the number of parallel start
> and stop operations (because they are resource hungry and potentially
> take long) without starving other operations, especially monitors.

No, they are not. The batch-limit only affects actions initiated by the
DC. The DC will initiate the first run of a monitor, so that will be
affected, but the local resource manager (lrmd) on the target node will
remember the monitor and run on it on schedule without further prompting
by the DC.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] why and when a call of crm_attribute can be delayed ?

2016-04-25 Thread Jehan-Guillaume de Rorthais
Hi all,

I am facing a strange issue with attrd while doing some testing on a three node
cluster with the pgsqlms RA [1].

pgsqld is my pgsqlms resource in the cluster. pgsql-ha is the master/slave
setup on top of pgsqld.

Before triggering a failure, here was the situation:

  * centos1: pgsql-ha slave
  * centos2: pgsql-ha slave
  * centos3: pgsql-ha master

Then we triggered a failure: the node centos3 has been kill using 

  echo c > /proc/sysrq-trigger

In this situation, PEngine provide a transition where :

  * centos3 is fenced 
  * pgsql-ha on centos2 is promoted

During the pre-promote notify action in the pgsqlms RA, each remaining slave are
setting a node attribute called lsn_location, see: 

  https://github.com/dalibo/PAF/blob/master/script/pgsqlms#L1504

  crm_attribute -l reboot -t status --node "$nodename" \
--name lsn_location --update "$node_lsn"

During the promotion action in the pgsqlms RA, the RA check the lsn_location of
the all the nodes to make sure the local one is higher or equal to all others.
See:

  https://github.com/dalibo/PAF/blob/master/script/pgsqlms#L1292

This is where we face a attrd behavior we don't understand.

Despite we can see in the log the RA was able to set its local
"lsn_location", during the promotion action, the RA was unable to read its
local lsn_location":

  pgsqlms(pgsqld)[9003]:  2016/04/22_14:46:16  
INFO: pgsql_notify: promoting instance on node "centos2" 

  pgsqlms(pgsqld)[9003]:  2016/04/22_14:46:16  
INFO: pgsql_notify: current node LSN: 0/1EE24000 

  [...]

  pgsqlms(pgsqld)[9023]:  2016/04/22_14:46:16
CRIT: pgsql_promote: can not get current node LSN location

  Apr 22 14:46:16 [5864] centos2   lrmd:
notice: operation_finished: pgsqld_promote_0:9023:stderr 
[ Error performing operation: No such device or address ] 

  Apr 22 14:46:16 [5864] centos2   lrmd: 
info: log_finished:  finished - rsc:pgsqld
action:promote call_id:211 pid:9023 exit-code:1 exec-time:107ms
queue-time:0ms

The error comes from:

  https://github.com/dalibo/PAF/blob/master/script/pgsqlms#L1320

**After** this error, we can see in the log file attrd set the "lsn_location" of
centos2:

  Apr 22 14:46:16 [5865] centos2
attrd: info: attrd_peer_update:
Setting lsn_location[centos2]: (null) -> 0/1EE24000 from centos2 

  Apr 22 14:46:16 [5865] centos2
attrd: info: write_attribute:   
Write out of 'lsn_location' delayed:update 189 in progress


As I understand it, the call of crm_attribute during pre-promote notification
has been taken into account AFTER the "promote" action, leading to this error.
Am I right?

Why and how this could happen? Could it comes from the dampen parameter? We did
not set any dampen anywhere, is there a default value in the cluster setup?
Could we avoid this behavior?

Please, find in attachment a tarball with :
  * all cluster logfiles from the three nodes
  * the content of /var/lib/pacemaker from the three nodes:
* CIBs
* PEngine transitions


Regards,

[1] https://github.com/dalibo/PAF
-- 
Jehan-Guillaume de Rorthais
Dalibo

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Monitoring action of Pacemaker resources fail because of high load on the nodes

2016-04-25 Thread Klaus Wenninger
On 04/26/2016 06:04 AM, Ken Gaillot wrote:
> On 04/25/2016 10:23 AM, Dmitri Maziuk wrote:
>> On 2016-04-24 16:20, Ken Gaillot wrote:
>>
>>> Correct, you would need to customize the RA.
>> Well, you wouldn't because your custom RA will be overwritten by the
>> next RPM update.
> Correct again :)
>
> I should have mentioned that the convention is to copy the script to a
> different name before editing it. The recommended approach is to create
> a new provider for your organization. For example, copy the RA to a new
> directory /usr/lib/ocf/resource.d/local, so it would be used in
> pacemaker as ocf:local:mysql. You can use anything in place of "local".
>
But what you are attempting doesn't sound entirely proprietary.
So once you have something that looks like it might be useful
for others as well let the community participate and free yourself
from having to always take care of your private copy ;-)
>> Dimitri
>>
>>
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org