Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
On Wed, Apr 6, 2011 at 3:38 PM, Dejan Muhamedagic deja...@fastmail.fm wrote: On Wed, Apr 06, 2011 at 01:00:36PM +0200, Andrew Beekhof wrote: On Tue, Apr 5, 2011 at 12:27 PM, Dejan Muhamedagic deja...@fastmail.fm wrote: Ah, right, sorry, wanted to ask about the difference between move-off and move. The description looks the same as for move. Is it that in this case it is for clones so crm_resource needs an extra node parameter? You wrote in the doc: +Migrate a resource (-instance for clones/masters) off the specified node. The '-instance' looks somewhat funny. Why not say Move/migrate a clone or master/slave instance away from the specified node? I must say that I still find all this quite confusing, i.e. now we have move, unmove, and move-off, but it's probably just me :) Not just you. The problem is that we didn't fully understand all the use case permutations at the time. I think, not withstanding legacy computability, move should probably be renamed to move-to and this new option be called move-from. That seems more obvious and syntactically consistent with the rest of the system. Yes, move-to and move-from seem more consistent than other options. The problem is that the old move is at times one and then at times another. Thats ok, we can make the compat code work as expected. In the absence of a host name, each uses the current location for the named group/primitive resource and complains for clones. The biggest question in my mind is what to call unmove... move-cleanup perhaps? move-remove? :D Actually, though the word is a bit awkward, unmove sounds fine to me. I think the challenge with unmove is that it appears to imply that the resource will move back - when in fact it just arranges things so that it could. So move-remove and move-cleanup get the user thinking in the right direction. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
On Wed, Apr 6, 2011 at 5:48 PM, Holger Teutsch holger.teut...@web.de wrote: On Wed, 2011-04-06 at 15:38 +0200, Dejan Muhamedagic wrote: On Wed, Apr 06, 2011 at 01:00:36PM +0200, Andrew Beekhof wrote: On Tue, Apr 5, 2011 at 12:27 PM, Dejan Muhamedagic deja...@fastmail.fm wrote: Ah, right, sorry, wanted to ask about the difference between move-off and move. The description looks the same as for move. Is it that in this case it is for clones so crm_resource needs an extra node parameter? You wrote in the doc: +Migrate a resource (-instance for clones/masters) off the specified node. The '-instance' looks somewhat funny. Why not say Move/migrate a clone or master/slave instance away from the specified node? I must say that I still find all this quite confusing, i.e. now we have move, unmove, and move-off, but it's probably just me :) Not just you. The problem is that we didn't fully understand all the use case permutations at the time. I think, not withstanding legacy computability, move should probably be renamed to move-to and this new option be called move-from. That seems more obvious and syntactically consistent with the rest of the system. Yes, move-to and move-from seem more consistent than other options. The problem is that the old move is at times one and then at times another. In the absence of a host name, each uses the current location for the named group/primitive resource and complains for clones. The biggest question in my mind is what to call unmove... move-cleanup perhaps? move-remove? :D Actually, though the word is a bit awkward, unmove sounds fine to me. I would vote for move-cleanup. It's consistent to move-XXX and to my (german) ears unmove seems to stand for the previous move being undone and the stuff comes back. BTW: Has someone already tried out the code or do you trust me 8-D ? I trust no-one - which is why we have regression tests :-) Stay tuned for updated patches... - holger Thanks, Dejan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
On Thu, 2011-04-07 at 08:57 +0200, Andrew Beekhof wrote: On Wed, Apr 6, 2011 at 5:48 PM, Holger Teutsch holger.teut...@web.de wrote: On Wed, 2011-04-06 at 15:38 +0200, Dejan Muhamedagic wrote: On Wed, Apr 06, 2011 at 01:00:36PM +0200, Andrew Beekhof wrote: On Tue, Apr 5, 2011 at 12:27 PM, Dejan Muhamedagic deja...@fastmail.fm wrote: Ah, right, sorry, wanted to ask about the difference between move-off and move. The description looks the same as for move. Is it that in this case it is for clones so crm_resource needs an extra node parameter? You wrote in the doc: +Migrate a resource (-instance for clones/masters) off the specified node. The '-instance' looks somewhat funny. Why not say Move/migrate a clone or master/slave instance away from the specified node? I must say that I still find all this quite confusing, i.e. now we have move, unmove, and move-off, but it's probably just me :) Not just you. The problem is that we didn't fully understand all the use case permutations at the time. I think, not withstanding legacy computability, move should probably be renamed to move-to and this new option be called move-from. That seems more obvious and syntactically consistent with the rest of the system. Yes, move-to and move-from seem more consistent than other options. The problem is that the old move is at times one and then at times another. In the absence of a host name, each uses the current location for the named group/primitive resource and complains for clones. The biggest question in my mind is what to call unmove... move-cleanup perhaps? move-remove? :D Actually, though the word is a bit awkward, unmove sounds fine to me. I would vote for move-cleanup. It's consistent to move-XXX and to my (german) ears unmove seems to stand for the previous move being undone and the stuff comes back. BTW: Has someone already tried out the code or do you trust me 8-D ? I trust no-one - which is why we have regression tests :-) Stay tuned for updated patches... Now, after an additional discussion round I propose the following: Please note that for consistency the --node argument is optional for --move-from New syntax: --- crm_resource --move-from --resource myresource --node mynode - all resource variants: check whether active on mynode, then create standby constraint crm_resource --move-from --resource myresource - primitive/group: set --node `current_node`, then create standby constraint - clone/master: refused crm_resource --move-to --resource myresource --node mynode - all resource variants: create prefer constraint crm_resource --move-to --resource myresource --master --node mynode - master: check whether active as slave on mynode, then create prefer constraint for master role - others: refused crm_resource --move-cleanup --resource myresource - zap constraints As we are already short on meaningful single letter options I vote for long options only. Backwards Compatibility: crm_resource {-M|--move} --resource myresource - output deprecation warning - treat as crm_resource --move-from --resource myresource crm_resource {-M|--move} --resource myresource --node mynode - output deprecation warning - treat as crm_resource --move-to --resource myresource --node mynode crm_resource {-U|--unmove} --resource myresource - output deprecation warning - treat as crm_resource --move-cleanup --resource myresource For the shell: Should we go for similar commands or keep migrate-XXX Coming back to Dejan's proposal of move-remove: That can be implemented of reexecuting the last move (a remove). Reimplemeting unmove as undo of the last move you have shortcuts for your favorite move operation move move-unmove - back move-remove - and forth Just kidding... - holger ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Clone resource dependency issue - undesired restart of dependent resources
Christoph Bartoschek wrote: Andrew Beekhof wrote: 2011/4/4 Christoph Bartoschek bartosc...@gmx.de: Andrew Beekhof wrote: 2011/4/1 Christoph Bartoschek bartosc...@gmx.de: Andrew Beekhof wrote: You didn't mention a version number... I think you'll be happier with 1.1.5 (I recall fixing a similar issue). I see a similar problem with 1.1.5. I have a two node NFS server setup. The following happens: 1. The resources run on node A. 2. I put node A into standby. 3. The resources are migrated to node B. 4. I make node A online again. 5. The services are stopped on node B. 6. The services are started on node B. In my opinion starting node A again should not cause the services to stop and start again. No argument there. Can you include a crm_report archive covering the time between 1 and 6 please? Could you please show me how the call to crm_report should look to give you the information you are interessted in? crm_report -h should contain sufficient information I hope the resulting file is ok. I called crm_report this way: crm_report -f 2011-04-04 12:15:00 \ -t 2011-04-04 12:18:00 \ -n laplace ries Is the report enough or is more information needed? Christoph ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Help With Cluster Failure
Hi all. One of my clusters had a STONITH shoot-out last night and then refused to do anything but sit there from 0400 until 0735 after I'd been woken up to fix it. In the end, just a resource cleanup fixed it, which I don't think should be the case. I have an 8MB hb_report file. Is that too big to attach to send here? Should I upload it somewhere? Thanks. Darren Mansell ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
Hi Holger, On Thu, Apr 07, 2011 at 10:03:49AM +0200, Holger Teutsch wrote: On Thu, 2011-04-07 at 08:57 +0200, Andrew Beekhof wrote: On Wed, Apr 6, 2011 at 5:48 PM, Holger Teutsch holger.teut...@web.de wrote: On Wed, 2011-04-06 at 15:38 +0200, Dejan Muhamedagic wrote: On Wed, Apr 06, 2011 at 01:00:36PM +0200, Andrew Beekhof wrote: On Tue, Apr 5, 2011 at 12:27 PM, Dejan Muhamedagic deja...@fastmail.fm wrote: Ah, right, sorry, wanted to ask about the difference between move-off and move. The description looks the same as for move. Is it that in this case it is for clones so crm_resource needs an extra node parameter? You wrote in the doc: +Migrate a resource (-instance for clones/masters) off the specified node. The '-instance' looks somewhat funny. Why not say Move/migrate a clone or master/slave instance away from the specified node? I must say that I still find all this quite confusing, i.e. now we have move, unmove, and move-off, but it's probably just me :) Not just you. The problem is that we didn't fully understand all the use case permutations at the time. I think, not withstanding legacy computability, move should probably be renamed to move-to and this new option be called move-from. That seems more obvious and syntactically consistent with the rest of the system. Yes, move-to and move-from seem more consistent than other options. The problem is that the old move is at times one and then at times another. In the absence of a host name, each uses the current location for the named group/primitive resource and complains for clones. The biggest question in my mind is what to call unmove... move-cleanup perhaps? move-remove? :D Actually, though the word is a bit awkward, unmove sounds fine to me. I would vote for move-cleanup. It's consistent to move-XXX and to my (german) ears unmove seems to stand for the previous move being undone and the stuff comes back. BTW: Has someone already tried out the code or do you trust me 8-D ? I trust no-one - which is why we have regression tests :-) Stay tuned for updated patches... Now, after an additional discussion round I propose the following: Please note that for consistency the --node argument is optional for --move-from New syntax: --- crm_resource --move-from --resource myresource --node mynode - all resource variants: check whether active on mynode, then create standby constraint crm_resource --move-from --resource myresource - primitive/group: set --node `current_node`, then create standby constraint - clone/master: refused crm_resource --move-to --resource myresource --node mynode - all resource variants: create prefer constraint crm_resource --move-to --resource myresource --master --node mynode - master: check whether active as slave on mynode, then create prefer constraint for master role - others: refused crm_resource --move-cleanup --resource myresource - zap constraints As we are already short on meaningful single letter options I vote for long options only. Backwards Compatibility: crm_resource {-M|--move} --resource myresource - output deprecation warning - treat as crm_resource --move-from --resource myresource crm_resource {-M|--move} --resource myresource --node mynode - output deprecation warning - treat as crm_resource --move-to --resource myresource --node mynode crm_resource {-U|--unmove} --resource myresource - output deprecation warning - treat as crm_resource --move-cleanup --resource myresource All looks fine to me. For the shell: Should we go for similar commands or keep migrate-XXX migrate is a bit of a misnomer, could be confused with the migrate operation. I'd vote to leave old migrate/unmigrate as deprecated and introduce just move-from/to/cleanup variants. Coming back to Dejan's proposal of move-remove: That can be implemented of reexecuting the last move (a remove). Reimplemeting unmove as undo of the last move you have shortcuts for your favorite move operation move move-unmove - back move-remove - and forth Well, how about remove-move? ;-) Cheers, Dejan Just kidding... - holger ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started:
Re: [Pacemaker] How to prevent locked I/O using Pacemaker with Primary/Primary DRBD/OCFS2 (Ubuntu 10.10)
Lars, Interesting, I will definitely continue in that direction then. Perhaps I misunderstood the requirements of STONITH. I understand it to be a form of ³remote reboot/shut down² of sorts, and being that the box was already ³shut down², I assumed at this stage in my testing that it could not be related to STONITH since the box was confirmed to be down. Perhaps Pacemaker is just awaiting that confirmation as you suggest, so thank you, I will see if that is indeed the case. I¹ve seen quite a few stonith operation options available, is there any one of them that is better suited for a simple two-node cluster (OCFS2)? Message: 1 Date: Thu, 7 Apr 2011 02:50:09 +0200 From: Lars Ellenberg lars.ellenb...@linbit.com To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] How to prevent locked I/O using Pacemaker with Primary/Primary DRBD/OCFS2 (Ubuntu 10.10) Message-ID: 20110407005009.GF3726@barkeeper1-xen.linbit Content-Type: text/plain; charset=iso-8859-1 On Wed, Apr 06, 2011 at 10:26:24AM -0600, Reid, Mike wrote: Lars, Thank you for your comments. I did confirm I was running 8.3.8.1, and I have even upgraded to 8.3.10 but am still experiencing the same I/O lock issue. I definitely agree with you, DRBD is behaving exactly as instructed, being properly fenced, etc. I am quite new to DRBD (and OCFS2), learning a lot as I go. To your question regarding copy/paste, yes, the configuration used was culminated from a series of different tutorials, plus personal trial and error related to this project. I have tried many variations of the DRBD config (including resource-and-stonith) but have not actually set up a functioning STONITH yet, And that's why your ocfs2 does not unblock. It waits for confirmation of a STONITH operation. hence the resource-only. The Linbit docs have been an amazing resource. Yes, I realize that a Secondary-node is not indicative of it's data/synch state. The options I am testing here were referenced from this page: http://www.drbd.org/users-guide/s-ocfs2-create-resource.html http://www.drbd.org/users-guide/s-configure-split-brain-behavior.html#s-autom atic-split-brain-recovery-configuration When you say You do configure automatic data loss here, are you suggesting that I am instructing DRBD survivor to perform a full re-synch to it's peer? Nothing to do with full sync. Should usually be a bitmap based resync. But it may be a sync in an unexpected direction. If so, that would make sense since I believe this behavior was something I experienced prior to getting fencing fully established. In my hard-boot testing, I did once notice the victim was completely resynching, which sounds related to after-sb-1pri discard-secondary. DRBD aside, have you used OCFS2? I'm failing to realize why if DRBD is fencing it's peer that OCFS2 remains in a locked-state, unable to run standalone? To me, this issue does not seem related to DRBD or Pacemaker, but rather a lower-level requirement of OCFS2 (DLM?), etc. To date, the ONLY way I can restore I/O to the remaining node is to bring the other node back online, which unfortunately won't work in our Production environment. On a separate ML, someone made a suggestion that qdisk might be required to make this work, and while I have tried qdisk, my high-level research leads me to believe that is a legacy approach, not an option with Pacemaker. Is that correct? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD? and LINBIT? are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker