Re: [Pacemaker] How to prevent locked I/O using Pacemaker with Primary/Primary DRBD/OCFS2 (Ubuntu 10.10)
On 04/04/11 22:55, Lars Ellenberg wrote: * resource minor=0 cs=WFConnection ro1=Primary ro2=Unknownds1=UpToDate ds2=Outdated / Why keep people using this pseudo xml output? where does that come from? we should un-document this. This is to be consumed by other programs (like the LINBIT DRBD-MC). This is not to be consumed by humans. when one is used to crm status, drbd status will be typed automatically without thinking ;) # drbdadm status drbd-status version=8.3.10 api=88 resources config_file=/etc/drbd.conf resource minor=0 name=db cs=Connected ro1=Secondary ro2=Primary ds1=UpToDate ds2=UpToDate / /resources /drbd-status imho, thats why... cheers, raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email.off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax.+43 1 3670030 15 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
On Mon, 2011-04-04 at 21:31 +0200, Holger Teutsch wrote: On Mon, 2011-04-04 at 15:24 +0200, Andrew Beekhof wrote: On Mon, Apr 4, 2011 at 2:43 PM, Holger Teutsch holger.teut...@web.de wrote: On Mon, 2011-04-04 at 11:05 +0200, Andrew Beekhof wrote: On Sat, Mar 19, 2011 at 11:55 AM, Holger Teutsch holger.teut...@web.de wrote: Hi Dejan, On Fri, 2011-03-18 at 14:24 +0100, Dejan Muhamedagic wrote: Hi, On Fri, Mar 18, 2011 at 12:21:40PM +0100, Holger Teutsch wrote: Hi, I would like to submit 2 patches of an initial implementation for discussion. .. To recall: crm_resource --move resource creates a standby rule that moves the resource off the currently active node while crm_resource --move resource --node newnode creates a prefer rule that moves the resource to the new node. When dealing with clones and masters the behavior was random as the code only considers the node where the first instance of the clone was started. The new code behaves consistently for the master role of an m/s resource. The options --master and rsc:master are somewhat redundant as a slave move is not supported. Currently it's more an acknowledgement of the user. On the other hand it is desirable (and was requested several times on the ML) to stop a single resource instance of a clone or master on a specific node. Should that be implemented by something like crm_resource --move-off --resource myresource --node devel2 ? or should crm_resource refuse to work on clones and/or should moving the master role be the default for m/s resources and the --master option discarded ? I think that we also need to consider the case when clone-max is less than the number of nodes. If I understood correctly what you were saying. So, all of move slave and move master and move clone should be possible. I think the following use cases cover what can be done with such kind of interface: crm_resource --moveoff --resource myresource --node mynode - all resource variants: check whether active on mynode, then create standby constraint crm_resource --move --resource myresource - primitive/group: convert to --moveoff --node `current_node` - clone/master: refused crm_resource --move --resource myresource --node mynode - primitive/group: create prefer constraint - clone/master: refused Not sure this needs to be refused. I see the problem that the node where the resource instance should be moved off had to be specified as well to get predictable behavior. Consider a a 2 way clone on a 3 node cluster. If the clone is active on A and B what should crm_resource --move --resource myClone --node C do ? I would expect it to create the +inf constraint for C but no contraint(s) for the current location(s) You are right. These are different and valid use cases. crm_resource --move --resource myClone --node C - I want an instance on C, regardless where it is moved off crm_resource --move-off --resource myClone --node C - I want the instance moved off C, regardless where it is moved on I tried them out with a reimplementation of the patch on a 3 node cluster with a resource with clone-max=2. The behavior appears logical (at least to me 8-) ). This would require an additional --from-node or similar. Other than that the proposal looks sane. My first thought was to make --move behave like --move-off if the resource is a clone or /ms, but since the semantics are the exact opposite, that might introduce introduce more problems than it solves. That was my perception as well. Does the original crm_resource patch implement this? No, I will submit an updated version later this week. - holger Hi, I submit revised patches for review. Summarizing preceding discussions the following functionality is implemented: crm_resource --move-off --resource myresource --node mynode - all resource variants: check whether active on mynode, then create standby constraint crm_resource --move --resource myresource - primitive/group: convert to --move-off --node `current_node` - clone/master: refused crm_resource --move --resource myresource --node mynode - all resource variants: create prefer constraint crm_resource --move --resource myresource --master --node mynode - master: check whether active as slave on mynode, then create prefer constraint for master role - others: refused The patch shell_migrate.diff supports this in the shell. This stuff is agnostic of what crm_migrate really does. Regards Holger diff -r b4f456380f60 doc/crm_cli.txt --- a/doc/crm_cli.txt Thu Mar 17 09:41:25 2011 +0100 +++ b/doc/crm_cli.txt Mon Apr 04
Re: [Pacemaker] double stonith device
Il 05/04/2011 01:21, Lars Marowsky-Bree ha scritto: On 2011-04-04T15:34:16, Christian Zoffoli czoff...@xmerlin.org wrote: I cannot connect both PSUs to a single PDU without loosing power source reduncancy (I have dual UPS, dual power lines and so on). Yes. Because the alternative is to lose the ability to fence the node if you lose one of the fencing devices. (I assume you'd be unable to confirm the fence if one of them was without power?) if the APC fence device is not available I can check the status using IPMI ...but considering only the APC switched PDU ...you are absolutely right You need to look at the larger picture: even though each node is only on one power supply, the whole cluster is on two. ...there is a side effect ...sometimes we need to bring down 1 power circuit ...for maintenance or whatever ...so we need two separate power circuit to not impact on availability Alternatively, use a mechanism like SBD which doesn't fence via the power device, but uses the shared storage. ...interesting ...I'll have a look also to SBD Thank you Christian ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] double stonith device
Il 04/04/2011 17:05, Dejan Muhamedagic ha scritto: [cut] How comes? AFAIK, many setups rely on IPMI for fencing. Just update the firmware and do proper testing. bad experience with some DRAC5 IPMIs probably the best way shoud be create a new stonith module to manage such setup That's also possible. We accept patches too :) if I'll choose such way ...you will see patches :) Christian ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
Hi Holger, On Tue, Apr 05, 2011 at 01:19:56PM +0200, Holger Teutsch wrote: Hi Dejan, On Tue, 2011-04-05 at 12:27 +0200, Dejan Muhamedagic wrote: On Tue, Apr 05, 2011 at 12:10:48PM +0200, Holger Teutsch wrote: Hi Dejan, On Tue, 2011-04-05 at 11:48 +0200, Dejan Muhamedagic wrote: Hi Holger, On Mon, Apr 04, 2011 at 09:31:02PM +0200, Holger Teutsch wrote: On Mon, 2011-04-04 at 15:24 +0200, Andrew Beekhof wrote: [...] crm_resource --move-off --resource myClone --node C - I want the instance moved off C, regardless where it is moved on What is the difference between move-off and unmigrate (-U)? --move-off - create a constraint that a resource should *not* run on the specific node (partly as before --move without --node) -U: zap all migration constraints (as before) Ah, right, sorry, wanted to ask about the difference between move-off and move. The description looks the same as for move. Is it that in this case it is for clones so crm_resource needs an extra node parameter? You wrote in the doc: +Migrate a resource (-instance for clones/masters) off the specified node. The '-instance' looks somewhat funny. Why not say Move/migrate a clone or master/slave instance away from the specified node? Moving away works for all kinds of resources so the text now looks like: diff -r b4f456380f60 doc/crm_cli.txt --- a/doc/crm_cli.txt Thu Mar 17 09:41:25 2011 +0100 +++ b/doc/crm_cli.txt Tue Apr 05 13:08:10 2011 +0200 @@ -818,10 +818,25 @@ running on the current node. Additionally, you may specify a lifetime for the constraint---once it expires, the location constraint will no longer be active. +For a master resource specify rsc:master to move the master role. Usage: ... -migrate rsc [node] [lifetime] [force] +migrate rsc[:master] [node] [lifetime] [force] +... + +[[cmdhelp_resource_migrateoff,migrate a resource off the specified node]] + `migrateoff` (`moveoff`) + +Migrate a resource away from the specified node. +The resource is migrated by creating a constraint which prevents it from +running on the specified node. Additionally, you may specify a +lifetime for the constraint---once it expires, the location +constraint will no longer be active. + +Usage: +... +migrateoff rsc node [lifetime] [force] ... [[cmdhelp_resource_unmigrate,unmigrate a resource to another node]] I must say that I still find all this quite confusing, i.e. now we have move, unmove, and move-off, but it's probably just me :) Think of move == move-to then it is simpler 8-) ... keeping in mind that for backward compatibility crm_resource --move --resource myResource is equivalent crm_resource --move-off --resource myResource --node $(current node) But as there is no current node for clones / masters the old implementation did some random movements... OK. Thanks for the clarification. I'd like to revise my previous comment about restricting use of certain constructs. For instance, in this case, if the command would result in a random movement then the shell should at least issue a warning about it. Or perhaps refuse to do that completely. I didn't take a look yet at the code, perhaps you've already done that. Thanks, Dejan Regards Holger Cheers, Dejan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] large number of pe-input-* files
Hi guys, penguin process creates a large number of files under /var/lib/pengine. We are using HA on a very high per box which is processing large amount of data fed fro an external source. There is a large number of files creation and IO taking place. We ran out of inodes because there were something like 1500 files under the mentioned directory: ls /var/lib/pengine/ | wc -l 1492 Is there a way to cleanup and or reduce these many files? Sincerely Shravan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] large number of pe-input-* files
On 04/05/2011 04:35 PM, Shravan Mishra wrote: Hi guys, penguin process creates a large number of files under /var/lib/pengine. We are using HA on a very high per box which is processing large amount of data fed fro an external source. There is a large number of files creation and IO taking place. We ran out of inodes because there were something like 1500 files under the mentioned directory: ls /var/lib/pengine/ | wc -l 1492 Is there a way to cleanup and or reduce these many files? pacemaker can do this by itself: property $id=cib-bootstrap-options \ ... pe-error-series-max=100 \ pe-warn-series-max=100 \ pe-input-series-max=100 \ ... you can read about this in the pacemaker documentation. cheers, raoul -- DI (FH) Raoul Bhatia M.Sc. email. r.bha...@ipax.at Technischer Leiter IPAX - Aloy Bhatia Hava OG web. http://www.ipax.at Barawitzkagasse 10/2/2/11 email.off...@ipax.at 1190 Wien tel. +43 1 3670030 FN 277995t HG Wien fax.+43 1 3670030 15 ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Immediate fs errors on iscsi connection problem
GREAT THANKS!!! It's work. But i have some little question, how can i got same effect on windows? Windows ISCSI initiator (win 2008 server) breaks connections and newer it restarts?? 4 апреля 2011 г. 8:24 пользователь Vladislav Bogdanov bub...@hoster-ok.comнаписал: Hi, 03.04.2011 22:42, ruslan usifov wrote: You need some tuning from both sides. First, (at least some versions of) ietd needs to be blocked (-j DROP) with iptables on restarts. That means, you should block all incoming and outgoing packets (later is more important) before ietd stop and unblock all after it starts. I use home-brew stateful RA for this, which blocks (DROP) all traffic to/from VIP in slave mode and passes it to a later decision (no -j) in master mode. Thanks for reply, But how to implemet this from pacemaker. Or i must to modify inet.d scripts? You can try attached. This was a first trial to write RA... It may contain some wrong logic, especially with scores, but it works for me. It is intended to be colocated with IP-address resource: colocation col1 inf: FW:Master ip order ord1 inf: ip:start FW:promote Vladislav ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
Hi Dejan, On Tue, 2011-04-05 at 13:40 +0200, Dejan Muhamedagic wrote: Hi Holger, On Tue, Apr 05, 2011 at 01:19:56PM +0200, Holger Teutsch wrote: Hi Dejan, On Tue, 2011-04-05 at 12:27 +0200, Dejan Muhamedagic wrote: On Tue, Apr 05, 2011 at 12:10:48PM +0200, Holger Teutsch wrote: Hi Dejan, On Tue, 2011-04-05 at 11:48 +0200, Dejan Muhamedagic wrote: Hi Holger, On Mon, Apr 04, 2011 at 09:31:02PM +0200, Holger Teutsch wrote: On Mon, 2011-04-04 at 15:24 +0200, Andrew Beekhof wrote: [...] crm_resource --move-off --resource myClone --node C - I want the instance moved off C, regardless where it is moved on What is the difference between move-off and unmigrate (-U)? --move-off - create a constraint that a resource should *not* run on the specific node (partly as before --move without --node) -U: zap all migration constraints (as before) Ah, right, sorry, wanted to ask about the difference between move-off and move. The description looks the same as for move. Is it that in this case it is for clones so crm_resource needs an extra node parameter? You wrote in the doc: +Migrate a resource (-instance for clones/masters) off the specified node. The '-instance' looks somewhat funny. Why not say Move/migrate a clone or master/slave instance away from the specified node? Moving away works for all kinds of resources so the text now looks like: diff -r b4f456380f60 doc/crm_cli.txt --- a/doc/crm_cli.txt Thu Mar 17 09:41:25 2011 +0100 +++ b/doc/crm_cli.txt Tue Apr 05 13:08:10 2011 +0200 @@ -818,10 +818,25 @@ running on the current node. Additionally, you may specify a lifetime for the constraint---once it expires, the location constraint will no longer be active. +For a master resource specify rsc:master to move the master role. Usage: ... -migrate rsc [node] [lifetime] [force] +migrate rsc[:master] [node] [lifetime] [force] +... + +[[cmdhelp_resource_migrateoff,migrate a resource off the specified node]] + `migrateoff` (`moveoff`) + +Migrate a resource away from the specified node. +The resource is migrated by creating a constraint which prevents it from +running on the specified node. Additionally, you may specify a +lifetime for the constraint---once it expires, the location +constraint will no longer be active. + +Usage: +... +migrateoff rsc node [lifetime] [force] ... [[cmdhelp_resource_unmigrate,unmigrate a resource to another node]] I must say that I still find all this quite confusing, i.e. now we have move, unmove, and move-off, but it's probably just me :) Think of move == move-to then it is simpler 8-) ... keeping in mind that for backward compatibility crm_resource --move --resource myResource is equivalent crm_resource --move-off --resource myResource --node $(current node) But as there is no current node for clones / masters the old implementation did some random movements... OK. Thanks for the clarification. I'd like to revise my previous comment about restricting use of certain constructs. For instance, in this case, if the command would result in a random movement then the shell should at least issue a warning about it. Or perhaps refuse to do that completely. I didn't take a look yet at the code, perhaps you've already done that. Thanks, Dejan I admit you have to specify more verbosely what you want to achieve but then the patched versions (based on patches I submitted today around 10:01) execute consistent and without surprises - at least for my test cases. Regards Holger ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] How to prevent locked I/O using Pacemaker with Primary/Primary DRBD/OCFS2 (Ubuntu 10.10)
Hi, I don't want to hijack this thread so feel free to change the Subject line if you feel like it. * Lars Ellenberg lars.ellenb...@linbit.com [20110404 16:56]: On Mon, Apr 04, 2011 at 01:34:48PM -0600, Mike Reid wrote: All, I am running a two-node web cluster on OCFS2 (v1.5.0) via DRBD Primary/Primary (v8.3.8) and Pacemaker. Everything seems to be working If you want to stay with 8.3.8, make sure you are using 8.3.8.1 (note the trailing .1), or you can run into stalled resyncs. Or upgrade to most recent. Just curious, I'm running 8.3.8 but not sure about the trailing '.1'. Am I safe with: ~# cat /proc/drbd version: 8.3.8 (api:88/proto:86-94) GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by root@puck, 2010-11-29 18:13:54 cheers, jf ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [Problem]Reboot by the error of the clone resource influences the resource of other nodes.
Hi Hideo-san, thank you very much for information. Will try it asap. Best, Vladislav 06.04.2011 04:54, renayama19661...@ybb.ne.jp wrote: Hi Vladislav, I confirmed that a problem was improved with a patch of Andrew. Please please try a patch in the environment that your problem produced. * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2574 Best Regards, Hideo Yamauchi. --- On Fri, 2011/4/1, Vladislav Bogdanov bub...@hoster-ok.com wrote: 01.04.2011 11:10, Andrew Beekhof wrote: On Fri, Apr 1, 2011 at 9:58 AM, Vladislav Bogdanov bub...@hoster-ok.com wrote: 01.04.2011 10:20, Andrew Beekhof wrote: The clone instance numbers for anonymous clones are an implementation detail and nothing should be inferred from them. Did anything actually get moved or just the numbers changed? Main inconvenience is that all dependent resources are forcibly restarted. Ok, then thats a bug. Is there a hb_report somewhere for this scenario? Sent privately (just a note for ML). ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker