Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts
On 04/25/2016 08:03 AM, Kristoffer Grönlund wrote: > Ken Gaillot writes: > >> Hello everybody, >> >> The release cycle for 1.1.15 will be started soon (hopefully tomorrow)! >> >> The most prominent feature will be Klaus Wenninger's new implementation >> of event-driven alerts -- the ability to call scripts whenever >> interesting events occur (nodes joining/leaving, resources >> starting/stopping, etc.). > Hi, and happy to see this! Looks like a potentially very useful feature. > > I started experimenting with support for alerts in crm, and have some > (very minor) nits/comments. > >> The meta-attributes are optional properties used by the cluster. >> Currently, they include "timeout" (which defaults to 30s) and >> "tstamp_format" (which defaults to "%H:%M:%S.%06N", and is a >> microsecond-resolution timestamp provided to the alert script as the >> CRM_alert_timestamp environment variable). > Is "tstamp_format" correct? All the other meta attributes are > in-this-format, so "tstamp-format" would be preferrable to > maintain consistency. Personally, I'd prefer "timestamp-format", but > that's veering into bikeshed territory... You have a point here. tstamp_format was there before the insight that there were a couple of attributes that belonged to kind of the same family as those grouped as meta-attributes when we look at resources. Probably still early enough to change it. I would as well prefer timestamp-format as the correlation with the variable CRM_alert_timestamp seems more natral then. >> In the current implementation, meta-attributes and instance attributes >> may also be specified within the block, in which case they >> override any values specified in the block when sent to that >> recipient. Whether this stays in the final 1.1.15 release or not depends >> on whether people find this to be useful, or confusing. > Do you have any current use for this? My immediate thought is that > allowing rule expressions in the level meta and instance > attributes would be both more expressive and less confusing. Do you refer to the global idea of repeated recipient-sections here or just to the overwriting of instance/meta-attributes of the alert-section by those in the recipient-section? A guy on the list was complaining that it was called recipient & value reading the example logging to a log-file. So an instance-attribute called logfile could be an example. Certain recipients (whatever a recipient might be ...) might react quicker and others might be more lame so a timeout per recipient might make sense. In cases of recipients being email-destination-addresses it might be interesting to be able to as well specify a sender-address or an smtp-server to use. Could you give examples for how you would like to use rule-expressions - especially if you want to replace the recipient-sections... > Cheers, > Kristoffer > ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts
Klaus Wenninger writes: > On 04/25/2016 08:03 AM, Kristoffer Grönlund wrote: >>> In the current implementation, meta-attributes and instance attributes >>> may also be specified within the block, in which case they >>> override any values specified in the block when sent to that >>> recipient. Whether this stays in the final 1.1.15 release or not depends >>> on whether people find this to be useful, or confusing. >> Do you have any current use for this? My immediate thought is that >> allowing rule expressions in the level meta and instance >> attributes would be both more expressive and less confusing. > Do you refer to the global idea of repeated recipient-sections here or > just to the overwriting of instance/meta-attributes of the alert-section > by those in the recipient-section? > The second, overwriting instance/meta-attributes by those in the recipient-section. > A guy on the list was complaining that it was called recipient & value > reading the example logging to a log-file. So an instance-attribute called > logfile could be an example. > Certain recipients (whatever a recipient might be ...) might react > quicker and others might be more lame so a timeout per recipient > might make sense. > In cases of recipients being email-destination-addresses it might > be interesting to be able to as well specify a sender-address or > an smtp-server to use. > Could you give examples for how you would like to use rule-expressions - > especially if you want to replace the recipient-sections... I haven't thought through the implications completely, but my thought is that for primitives, for example, you would create multiple instance-attribute entries with rule expressions that determine which value is applied under which conditions (so, on this node set FOO to this value, on that node set FOO to that value, etc.). First of all I would ask if rule expressions already are permitted in instance-attribute tags in the alert tag? If so, then making it possible to create rule expressions that check against the recipient would make sense as well as remove the need to allow overrides in each recipient tag. But I don't have any concrete use case either way, I am only looking at this from a consistency point of view. > >> Cheers, >> Kristoffer >> > > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- // Kristoffer Grönlund // kgronl...@suse.com ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Fw: Moving Related Servers
Hi, Thanks for your offer. I checked this and this is a amazing solution.So I defined two cluster : testcluster1:App1App2resource : IP float testcluster2:App3App4 resource : tomcat I know that we need to grant a ticket and manage that with Booth. But I couldn't understand how should I define a ticket and relation of nodes and clusters with the ticket. I read the mentioned doc, but I missed up. Can you give me one example? Thanks so. From: Ken Gaillot On 04/20/2016 12:44 AM, H Yavari wrote: > You got my situation right. But I couldn't find any method to do this? > > I should create one cluster with 4 node or 2 cluster with 2 node ? How I > restrict the cluster nodes to each other!!? Your last questions made me think of multi-site clustering using booth. I think this might be the best solution for you. You can configure two independent pacemaker clusters of 2 nodes each, then use booth to ensure that one cluster has the resources at any time. See: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140617279413776 This is usually done with clusters at physically separate locations, but there's no problem with using it with two clusters in one location. Alternatively, going along more traditional lines such as what Klaus and I have mentioned, you could use rules and node attributes to keep the resources where desired. You could write a custom resource agent that would set a custom node attribute for the matching node (the start action should set the attribute to 1, and the stop action should set the attribute to 0; if the resource was on App 1, you'd set the attribute for App 3, and if the resource was on App 4, you'd set the attribute for App 4). Colocate that resource with your floating IP, and use a rule to locate service X where the custom node attribute is 1. See: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140617279376656 http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140617356537136 > > > *From:* Klaus Wenninger > *To:* users@clusterlabs.org > *Sent:* Wednesday, 20 April 2016, 9:56:05 > *Subject:* Re: [ClusterLabs] Moving Related Servers > > On 04/19/2016 04:32 PM, Ken Gaillot wrote: >> On 04/18/2016 10:05 PM, H Yavari wrote: >>> Hi, >>> >>> This is servers maps: >>> >>> App 3-> App 1 (Active) >>> >>> App 4 -> App 2 (Standby) >>> >>> >>> Now App1 and App2 are in a cluster with IP failover. >>> >>> I need when IP failover will run and App2 will be Active node, service >>> "X" on server App3 will be stop and App 4 will be Active node. >>> In the other words, App1 works only with App3 and App 2 works with App 4. >>> >>> I have a web application on App1 and some services on App 3 (this is >>> same for App2 and App 4) >> This is a difficult situation to model. In particular, you could only >> have a dependency one way -- so if we could get App 3 to fail over if >> App 1 fails, we couldn't model the other direction (App 1 failing over >> if App 3 fails). If each is dependent on the other, there's no way to >> start one first. >> >> Is there a technical reason App 3 can work only with App 1? >> >> Is it possible for service "X" to stay running on both App 3 and App 4 >> all the time? If so, this becomes easier. > Just another try to understand what you are aiming for: > > You have a 2-node-cluster at the moment consisting of the nodes > App1 & App2. > You configured something like a master/slave-group to realize > an active/standby scenario. > > To get the servers App3 & App4 into the game we would make > them additional pacemaker-nodes (App3 & App4). > You now have a service X that could be running either on App3 or > App4 (which is easy by e.g. making it dependent on a node attribute) > and it should be running on App3 when the service-group is active > (master in pacemaker terms) on App1 and on App4 when the > service-group is active on App2. > > The standard thing would be to collocate a service with the master-role > (see all the DRBD examples for instance). > We would now need a locate-x when master is located-y rule instead > of collocation. > I don't know any way to directly specify this. > One - ugly though - way around I could imagine would be: > > - locate service X1 on App3 > - locate service X2 on App4 > - dummy service Y1 is located App1 and collocated with master-role > - dummy service Y2 is located App2 and collocated with master-role > - service X1 depends on Y1 > - service X2 depends on Y2 > > If that somehow reflects your situation the key question now would > probably be if pengine would make the group on App2 master > if service X1 fails on App3. I would guess yes but I'm not sure. > > Regards, > Klaus >
Re: [ClusterLabs] Performance of a mirrored LV (cLVM) with OCFS: Attempt to monitor it
On 2016-04-25T10:10:38, Ulrich Windl wrote: Hi Ulrich, I can't really comment on why the cLVM2 is slow (somewhat surprisingly, because flock is meta-data only and thus shouldn't even be affected by cLVM2, anyway ...). But on the subject of performance, you're quite right - we know that cLVM2 is not fast enough, thus there has been an effort to make md raid cluster aware (especially RAID1). cluster-md is almost completely merged upstream and coming to your favorite enterprise distribution very soon too ;-) Regards, Lars -- Architect SDS, Distinguished Engineer SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Re: Performance of a mirrored LV (cLVM) with OCFS: Attempt to monitor it
>>> Lars Marowsky-Bree schrieb am 25.04.2016 um 12:12 in Nachricht <20160425101236.gd10...@suse.de>: > On 2016-04-25T10:10:38, Ulrich Windl wrote: > > Hi Ulrich, > > I can't really comment on why the cLVM2 is slow (somewhat surprisingly, > because flock is meta-data only and thus shouldn't even be affected by > cLVM2, anyway ...). > > But on the subject of performance, you're quite right - we know that > cLVM2 is not fast enough, thus there has been an effort to make md raid > cluster aware (especially RAID1). cluster-md is almost completely > merged upstream and coming to your favorite enterprise distribution very > soon too ;-) Lars, that's good news. As we've made good experience with MD-RAID, I really thought about having an MD-RAID on one node and export that RAID via iSCSI to all the nodes that need access. Unfortunately I cannot compare performance ahead of time 8-( Regards, Ulrich > > > Regards, > Lars > > -- > Architect SDS, Distinguished Engineer > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB > 21284 (AG Nürnberg) > "Experience is the name everyone gives to their mistakes." -- Oscar Wilde > > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Antw: Re: Performance of a mirrored LV (cLVM) with OCFS: Attempt to monitor it
On 2016-04-25T12:40:31, Ulrich Windl wrote: > As we've made good experience with MD-RAID, I really thought about having an > MD-RAID on one node and export that RAID via iSCSI to all the nodes that need > access. Unfortunately I cannot compare performance ahead of time 8-( The additional IO hop would pretty badly hurt. Not so much on bandwidth (if your NICs are capable of handling the throughput), but latency adds up. -- Architect SDS, Distinguished Engineer SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts
On 2016-04-21T12:50:43, Ken Gaillot wrote: Hi all, awesome to see such a cool new feature land! I do have some questions/feedback though. > The alerts section can have any number of alerts, which look like: > > path="/srv/pacemaker/pcmk_alert_sample.sh"> > > value="/var/log/cluster-alerts.log" /> > > So, there's one bit of this I dislike - instance_attributes get passed via the environment (as always), but the "value" ends up on the command-line in ARGV[]? Why? Wouldn't it make more sense to have an alert-wide instance_attribute section within , that could be overridden on a per-recipient basis if needed? And drop the value entirely? Having things in ARGV[] is always risky due to them being exposed more easily via ps. Environment variables or stdin appear better. What I also miss is the ability to filter the events (at least coarsely?) sent to a specific alert/recipient, and to constraint on which nodes it will get executed. Is that going to happen? On a busy cluster, this could easily cause significant load otherwise. It's also worth pointing out that this could likely "lose" events during fail-overs, DC crashes, etc. Users probably should not strictly rely on seeing *every* alert in their scripts, so this should be carefully documented to not be considered a transactional, reliable message bus. Regards, Lars -- Architect SDS SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Coming in 1.1.15: Event-driven alerts
On Thu, Apr 21, 2016 at 12:50:43PM -0500, Ken Gaillot wrote: > Hello everybody, > > The release cycle for 1.1.15 will be started soon (hopefully tomorrow)! > > The most prominent feature will be Klaus Wenninger's new implementation > of event-driven alerts -- the ability to call scripts whenever > interesting events occur (nodes joining/leaving, resources > starting/stopping, etc.). What exactly is "etc." here? What is the comprehensive list of which "events" will trigger "alerts"? My guess would be DC election/change which does not necessarily imply membership change change in membership which includes change in quorum fencing events (even failed fencing?) resource start/stop/promote/demote (probably) monitor failure? maybe only if some fail-count changes to/from infinity? or above a certain threshold? change of maintenance-mode? node standby/online (maybe)? maybe "resource cannot be run anywhere"? would it be useful to pass in the "transaction ID" or other pointer to the recorded cib input at the time the "alert" was triggered? can an alert "observer" (alert script) "register" for only a subset of the "alerts"? if so, can this filter be per alert script, or per "recipient", or both? Thanks, Lars ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Monitoring action of Pacemaker resources fail because of high load on the nodes
On 2016-04-24 16:20, Ken Gaillot wrote: Correct, you would need to customize the RA. Well, you wouldn't because your custom RA will be overwritten by the next RPM update. Dimitri ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Moving Related Servers
Hi, Thanks for your offer. I checked this and this is a amazing solution.So I defined two cluster : testcluster1:App1App2resource : IP float testcluster2:App3App4 resource : tomcat I know that we need to grant a ticket and manage that with Booth. But I couldn't understand how should I define a ticket and relation of nodes and clusters with the ticket. I read the mentioned doc, but I missed up. Can you give me one example? Thanks so. From: Ken Gaillot On 04/20/2016 12:44 AM, H Yavari wrote: > You got my situation right. But I couldn't find any method to do this? > > I should create one cluster with 4 node or 2 cluster with 2 node ? How I > restrict the cluster nodes to each other!!? Your last questions made me think of multi-site clustering using booth. I think this might be the best solution for you. You can configure two independent pacemaker clusters of 2 nodes each, then use booth to ensure that one cluster has the resources at any time. See: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140617279413776 This is usually done with clusters at physically separate locations, but there's no problem with using it with two clusters in one location. Alternatively, going along more traditional lines such as what Klaus and I have mentioned, you could use rules and node attributes to keep the resources where desired. You could write a custom resource agent that would set a custom node attribute for the matching node (the start action should set the attribute to 1, and the stop action should set the attribute to 0; if the resource was on App 1, you'd set the attribute for App 3, and if the resource was on App 4, you'd set the attribute for App 4). Colocate that resource with your floating IP, and use a rule to locate service X where the custom node attribute is 1. See: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140617279376656 http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm140617356537136 > > > *From:* Klaus Wenninger > *To:* users@clusterlabs.org > *Sent:* Wednesday, 20 April 2016, 9:56:05 > *Subject:* Re: [ClusterLabs] Moving Related Servers > > On 04/19/2016 04:32 PM, Ken Gaillot wrote: >> On 04/18/2016 10:05 PM, H Yavari wrote: >>> Hi, >>> >>> This is servers maps: >>> >>> App 3-> App 1 (Active) >>> >>> App 4 -> App 2 (Standby) >>> >>> >>> Now App1 and App2 are in a cluster with IP failover. >>> >>> I need when IP failover will run and App2 will be Active node, service >>> "X" on server App3 will be stop and App 4 will be Active node. >>> In the other words, App1 works only with App3 and App 2 works with App 4. >>> >>> I have a web application on App1 and some services on App 3 (this is >>> same for App2 and App 4) >> This is a difficult situation to model. In particular, you could only >> have a dependency one way -- so if we could get App 3 to fail over if >> App 1 fails, we couldn't model the other direction (App 1 failing over >> if App 3 fails). If each is dependent on the other, there's no way to >> start one first. >> >> Is there a technical reason App 3 can work only with App 1? >> >> Is it possible for service "X" to stay running on both App 3 and App 4 >> all the time? If so, this becomes easier. > Just another try to understand what you are aiming for: > > You have a 2-node-cluster at the moment consisting of the nodes > App1 & App2. > You configured something like a master/slave-group to realize > an active/standby scenario. > > To get the servers App3 & App4 into the game we would make > them additional pacemaker-nodes (App3 & App4). > You now have a service X that could be running either on App3 or > App4 (which is easy by e.g. making it dependent on a node attribute) > and it should be running on App3 when the service-group is active > (master in pacemaker terms) on App1 and on App4 when the > service-group is active on App2. > > The standard thing would be to collocate a service with the master-role > (see all the DRBD examples for instance). > We would now need a locate-x when master is located-y rule instead > of collocation. > I don't know any way to directly specify this. > One - ugly though - way around I could imagine would be: > > - locate service X1 on App3 > - locate service X2 on App4 > - dummy service Y1 is located App1 and collocated with master-role > - dummy service Y2 is located App2 and collocated with master-role > - service X1 depends on Y1 > - service X2 depends on Y2 > > If that somehow reflects your situation the key question now would > probably be if pengine would make the group on App2 master > if service X1 fails on App3. I would guess yes but I'm not sure. > > Regards, > Klaus
Re: [ClusterLabs] Monitoring action of Pacemaker resources fail because of high load on the nodes
On 04/25/2016 10:23 AM, Dmitri Maziuk wrote: > On 2016-04-24 16:20, Ken Gaillot wrote: > >> Correct, you would need to customize the RA. > > Well, you wouldn't because your custom RA will be overwritten by the > next RPM update. Correct again :) I should have mentioned that the convention is to copy the script to a different name before editing it. The recommended approach is to create a new provider for your organization. For example, copy the RA to a new directory /usr/lib/ocf/resource.d/local, so it would be used in pacemaker as ocf:local:mysql. You can use anything in place of "local". > Dimitri > > > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] operation parallelism
On 04/22/2016 09:05 AM, Ferenc Wágner wrote: > Hi, > > Are recurring monitor operations constrained by the batch-limit cluster > option? I ask because I'd like to limit the number of parallel start > and stop operations (because they are resource hungry and potentially > take long) without starving other operations, especially monitors. No, they are not. The batch-limit only affects actions initiated by the DC. The DC will initiate the first run of a monitor, so that will be affected, but the local resource manager (lrmd) on the target node will remember the monitor and run on it on schedule without further prompting by the DC. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] why and when a call of crm_attribute can be delayed ?
Hi all, I am facing a strange issue with attrd while doing some testing on a three node cluster with the pgsqlms RA [1]. pgsqld is my pgsqlms resource in the cluster. pgsql-ha is the master/slave setup on top of pgsqld. Before triggering a failure, here was the situation: * centos1: pgsql-ha slave * centos2: pgsql-ha slave * centos3: pgsql-ha master Then we triggered a failure: the node centos3 has been kill using echo c > /proc/sysrq-trigger In this situation, PEngine provide a transition where : * centos3 is fenced * pgsql-ha on centos2 is promoted During the pre-promote notify action in the pgsqlms RA, each remaining slave are setting a node attribute called lsn_location, see: https://github.com/dalibo/PAF/blob/master/script/pgsqlms#L1504 crm_attribute -l reboot -t status --node "$nodename" \ --name lsn_location --update "$node_lsn" During the promotion action in the pgsqlms RA, the RA check the lsn_location of the all the nodes to make sure the local one is higher or equal to all others. See: https://github.com/dalibo/PAF/blob/master/script/pgsqlms#L1292 This is where we face a attrd behavior we don't understand. Despite we can see in the log the RA was able to set its local "lsn_location", during the promotion action, the RA was unable to read its local lsn_location": pgsqlms(pgsqld)[9003]: 2016/04/22_14:46:16 INFO: pgsql_notify: promoting instance on node "centos2" pgsqlms(pgsqld)[9003]: 2016/04/22_14:46:16 INFO: pgsql_notify: current node LSN: 0/1EE24000 [...] pgsqlms(pgsqld)[9023]: 2016/04/22_14:46:16 CRIT: pgsql_promote: can not get current node LSN location Apr 22 14:46:16 [5864] centos2 lrmd: notice: operation_finished: pgsqld_promote_0:9023:stderr [ Error performing operation: No such device or address ] Apr 22 14:46:16 [5864] centos2 lrmd: info: log_finished: finished - rsc:pgsqld action:promote call_id:211 pid:9023 exit-code:1 exec-time:107ms queue-time:0ms The error comes from: https://github.com/dalibo/PAF/blob/master/script/pgsqlms#L1320 **After** this error, we can see in the log file attrd set the "lsn_location" of centos2: Apr 22 14:46:16 [5865] centos2 attrd: info: attrd_peer_update: Setting lsn_location[centos2]: (null) -> 0/1EE24000 from centos2 Apr 22 14:46:16 [5865] centos2 attrd: info: write_attribute: Write out of 'lsn_location' delayed:update 189 in progress As I understand it, the call of crm_attribute during pre-promote notification has been taken into account AFTER the "promote" action, leading to this error. Am I right? Why and how this could happen? Could it comes from the dampen parameter? We did not set any dampen anywhere, is there a default value in the cluster setup? Could we avoid this behavior? Please, find in attachment a tarball with : * all cluster logfiles from the three nodes * the content of /var/lib/pacemaker from the three nodes: * CIBs * PEngine transitions Regards, [1] https://github.com/dalibo/PAF -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Monitoring action of Pacemaker resources fail because of high load on the nodes
On 04/26/2016 06:04 AM, Ken Gaillot wrote: > On 04/25/2016 10:23 AM, Dmitri Maziuk wrote: >> On 2016-04-24 16:20, Ken Gaillot wrote: >> >>> Correct, you would need to customize the RA. >> Well, you wouldn't because your custom RA will be overwritten by the >> next RPM update. > Correct again :) > > I should have mentioned that the convention is to copy the script to a > different name before editing it. The recommended approach is to create > a new provider for your organization. For example, copy the RA to a new > directory /usr/lib/ocf/resource.d/local, so it would be used in > pacemaker as ocf:local:mysql. You can use anything in place of "local". > But what you are attempting doesn't sound entirely proprietary. So once you have something that looks like it might be useful for others as well let the community participate and free yourself from having to always take care of your private copy ;-) >> Dimitri >> >> >> >> ___ >> Users mailing list: Users@clusterlabs.org >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org