Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-04-07 Thread Andrew Beekhof
On Wed, Apr 6, 2011 at 3:38 PM, Dejan Muhamedagic deja...@fastmail.fm wrote:
 On Wed, Apr 06, 2011 at 01:00:36PM +0200, Andrew Beekhof wrote:
 On Tue, Apr 5, 2011 at 12:27 PM, Dejan Muhamedagic deja...@fastmail.fm 
 wrote:
  Ah, right, sorry, wanted to ask about the difference between
  move-off and move. The description looks the same as for move. Is
  it that in this case it is for clones so crm_resource needs an
  extra node parameter? You wrote in the doc:
 
         +Migrate a resource (-instance for clones/masters) off the 
  specified node.
 
  The '-instance' looks somewhat funny. Why not say Move/migrate a
  clone or master/slave instance away from the specified node?
 
  I must say that I still find all this quite confusing, i.e. now
  we have move, unmove, and move-off, but it's probably just me :)

 Not just you.  The problem is that we didn't fully understand all the
 use case permutations at the time.

 I think, not withstanding legacy computability, move should probably
 be renamed to move-to and this new option be called move-from.
 That seems more obvious and syntactically consistent with the rest of
 the system.

 Yes, move-to and move-from seem more consistent than other
 options. The problem is that the old move is at times one and
 then at times another.

Thats ok, we can make the compat code work as expected.


 In the absence of a host name, each uses the current location for the
 named group/primitive resource and complains for clones.

 The biggest question in my mind is what to call unmove...
 move-cleanup perhaps?

 move-remove? :D

 Actually, though the word is a bit awkward, unmove sounds fine
 to me.

I think the challenge with unmove is that it appears to imply that the
resource will move back - when in fact it just arranges things so that
it could.
So move-remove and move-cleanup get the user thinking in the right direction.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-04-07 Thread Andrew Beekhof
On Wed, Apr 6, 2011 at 5:48 PM, Holger Teutsch holger.teut...@web.de wrote:
 On Wed, 2011-04-06 at 15:38 +0200, Dejan Muhamedagic wrote:
 On Wed, Apr 06, 2011 at 01:00:36PM +0200, Andrew Beekhof wrote:
  On Tue, Apr 5, 2011 at 12:27 PM, Dejan Muhamedagic deja...@fastmail.fm 
  wrote:
   Ah, right, sorry, wanted to ask about the difference between
   move-off and move. The description looks the same as for move. Is
   it that in this case it is for clones so crm_resource needs an
   extra node parameter? You wrote in the doc:
  
          +Migrate a resource (-instance for clones/masters) off the 
   specified node.
  
   The '-instance' looks somewhat funny. Why not say Move/migrate a
   clone or master/slave instance away from the specified node?
  
   I must say that I still find all this quite confusing, i.e. now
   we have move, unmove, and move-off, but it's probably just me :)
 
  Not just you.  The problem is that we didn't fully understand all the
  use case permutations at the time.
 
  I think, not withstanding legacy computability, move should probably
  be renamed to move-to and this new option be called move-from.
  That seems more obvious and syntactically consistent with the rest of
  the system.

 Yes, move-to and move-from seem more consistent than other
 options. The problem is that the old move is at times one and
 then at times another.

  In the absence of a host name, each uses the current location for the
  named group/primitive resource and complains for clones.
 
  The biggest question in my mind is what to call unmove...
  move-cleanup perhaps?

 move-remove? :D
 Actually, though the word is a bit awkward, unmove sounds fine
 to me.

 I would vote for move-cleanup. It's consistent to move-XXX and to my
 (german) ears unmove seems to stand for the previous move being
 undone and the stuff comes back.

 BTW: Has someone already tried out the code or do you trust me 8-D ?

I trust no-one - which is why we have regression tests :-)


 Stay tuned for updated patches...

 - holger

 Thanks,

 Dejan

  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: 
  http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-04-07 Thread Holger Teutsch
On Thu, 2011-04-07 at 08:57 +0200, Andrew Beekhof wrote:
 On Wed, Apr 6, 2011 at 5:48 PM, Holger Teutsch holger.teut...@web.de wrote:
  On Wed, 2011-04-06 at 15:38 +0200, Dejan Muhamedagic wrote:
  On Wed, Apr 06, 2011 at 01:00:36PM +0200, Andrew Beekhof wrote:
   On Tue, Apr 5, 2011 at 12:27 PM, Dejan Muhamedagic deja...@fastmail.fm 
   wrote:
Ah, right, sorry, wanted to ask about the difference between
move-off and move. The description looks the same as for move. Is
it that in this case it is for clones so crm_resource needs an
extra node parameter? You wrote in the doc:
   
   +Migrate a resource (-instance for clones/masters) off the 
specified node.
   
The '-instance' looks somewhat funny. Why not say Move/migrate a
clone or master/slave instance away from the specified node?
   
I must say that I still find all this quite confusing, i.e. now
we have move, unmove, and move-off, but it's probably just me :)
  
   Not just you.  The problem is that we didn't fully understand all the
   use case permutations at the time.
  
   I think, not withstanding legacy computability, move should probably
   be renamed to move-to and this new option be called move-from.
   That seems more obvious and syntactically consistent with the rest of
   the system.
 
  Yes, move-to and move-from seem more consistent than other
  options. The problem is that the old move is at times one and
  then at times another.
 
   In the absence of a host name, each uses the current location for the
   named group/primitive resource and complains for clones.
  
   The biggest question in my mind is what to call unmove...
   move-cleanup perhaps?
 
  move-remove? :D
  Actually, though the word is a bit awkward, unmove sounds fine
  to me.
 
  I would vote for move-cleanup. It's consistent to move-XXX and to my
  (german) ears unmove seems to stand for the previous move being
  undone and the stuff comes back.
 
  BTW: Has someone already tried out the code or do you trust me 8-D ?
 
 I trust no-one - which is why we have regression tests :-)
 
 
  Stay tuned for updated patches...

Now, after an additional discussion round I propose the following:
Please note that for consistency the --node argument is optional for 
--move-from

New syntax:
---

crm_resource --move-from --resource myresource --node mynode
   - all resource variants: check whether active on mynode, then create 
standby constraint

crm_resource --move-from --resource myresource
   - primitive/group: set --node `current_node`, then create standby constraint
   - clone/master: refused

crm_resource --move-to --resource myresource --node mynode
  - all resource variants: create prefer constraint

crm_resource --move-to --resource myresource --master --node mynode
  - master: check whether active as slave on mynode, then create prefer 
constraint for master role
  - others: refused

crm_resource --move-cleanup --resource myresource
  - zap constraints

As we are already short on meaningful single letter options I vote for long 
options only.

Backwards Compatibility:


crm_resource {-M|--move} --resource myresource
  - output deprecation warning
  - treat as crm_resource --move-from --resource myresource

crm_resource {-M|--move} --resource myresource --node mynode
  - output deprecation warning
  - treat as crm_resource --move-to --resource myresource --node mynode

crm_resource {-U|--unmove} --resource myresource
  - output deprecation warning
  - treat as crm_resource --move-cleanup --resource myresource

For the shell:
Should we go for similar commands or keep migrate-XXX


Coming back to Dejan's proposal of move-remove:

That can be implemented of reexecuting the last move (a remove).
Reimplemeting unmove as undo of the last move you have shortcuts for your 
favorite move operation

move
move-unmove - back
move-remove - and forth

Just kidding...
 

 
  - holger
 



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Clone resource dependency issue - undesired restart of dependent resources

2011-04-07 Thread Christoph Bartoschek
Christoph Bartoschek wrote:

 Andrew Beekhof wrote:
 
 2011/4/4 Christoph Bartoschek bartosc...@gmx.de:
 Andrew Beekhof wrote:

 2011/4/1 Christoph Bartoschek
 bartosc...@gmx.de:
 Andrew Beekhof wrote:

 You didn't mention a version number... I think you'll be happier with
 1.1.5 (I recall fixing a similar issue).

 I see a similar problem with 1.1.5.  I have a two node NFS server
 setup.

 The following happens:

 1. The resources run on node A.
 2. I put node A into standby.
 3. The resources are migrated to node B.
 4. I make node A online again.
 5. The services are stopped on node B.
 6. The services are started on node B.

 In my opinion starting node A again should not cause the services to
 stop and start again.

 No argument there. Can you include a crm_report archive covering the
 time between 1 and 6 please?

 Could you please show me how the call to crm_report should look to give
 you the information you are interessted in?
 
 crm_report -h should contain sufficient information
 
 I hope the resulting file is ok. I called crm_report this way:
 
 crm_report  -f 2011-04-04 12:15:00 \
 -t 2011-04-04 12:18:00 \
 -n laplace ries
 

Is the report enough or is more information needed?

Christoph


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Help With Cluster Failure

2011-04-07 Thread Darren.Mansell
Hi all. 

 

One of my clusters had a STONITH shoot-out last night and then refused
to do anything but sit there from 0400 until 0735 after I'd been woken
up to fix it.

 

In the end, just a resource cleanup fixed it, which I don't think should
be the case.

 

I have an 8MB hb_report file. Is that too big to attach to send here?
Should I upload it somewhere?

 

Thanks.

Darren Mansell

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-04-07 Thread Dejan Muhamedagic
Hi Holger,

On Thu, Apr 07, 2011 at 10:03:49AM +0200, Holger Teutsch wrote:
 On Thu, 2011-04-07 at 08:57 +0200, Andrew Beekhof wrote:
  On Wed, Apr 6, 2011 at 5:48 PM, Holger Teutsch holger.teut...@web.de 
  wrote:
   On Wed, 2011-04-06 at 15:38 +0200, Dejan Muhamedagic wrote:
   On Wed, Apr 06, 2011 at 01:00:36PM +0200, Andrew Beekhof wrote:
On Tue, Apr 5, 2011 at 12:27 PM, Dejan Muhamedagic 
deja...@fastmail.fm wrote:
 Ah, right, sorry, wanted to ask about the difference between
 move-off and move. The description looks the same as for move. Is
 it that in this case it is for clones so crm_resource needs an
 extra node parameter? You wrote in the doc:

+Migrate a resource (-instance for clones/masters) off the 
 specified node.

 The '-instance' looks somewhat funny. Why not say Move/migrate a
 clone or master/slave instance away from the specified node?

 I must say that I still find all this quite confusing, i.e. now
 we have move, unmove, and move-off, but it's probably just me 
 :)
   
Not just you.  The problem is that we didn't fully understand all the
use case permutations at the time.
   
I think, not withstanding legacy computability, move should probably
be renamed to move-to and this new option be called move-from.
That seems more obvious and syntactically consistent with the rest of
the system.
  
   Yes, move-to and move-from seem more consistent than other
   options. The problem is that the old move is at times one and
   then at times another.
  
In the absence of a host name, each uses the current location for the
named group/primitive resource and complains for clones.
   
The biggest question in my mind is what to call unmove...
move-cleanup perhaps?
  
   move-remove? :D
   Actually, though the word is a bit awkward, unmove sounds fine
   to me.
  
   I would vote for move-cleanup. It's consistent to move-XXX and to my
   (german) ears unmove seems to stand for the previous move being
   undone and the stuff comes back.
  
   BTW: Has someone already tried out the code or do you trust me 8-D ?
  
  I trust no-one - which is why we have regression tests :-)
  
  
   Stay tuned for updated patches...
 
 Now, after an additional discussion round I propose the following:
 Please note that for consistency the --node argument is optional for 
 --move-from
 
 New syntax:
 ---
 
 crm_resource --move-from --resource myresource --node mynode
- all resource variants: check whether active on mynode, then create 
 standby constraint
 
 crm_resource --move-from --resource myresource
- primitive/group: set --node `current_node`, then create standby 
 constraint
- clone/master: refused
 
 crm_resource --move-to --resource myresource --node mynode
   - all resource variants: create prefer constraint
 
 crm_resource --move-to --resource myresource --master --node mynode
   - master: check whether active as slave on mynode, then create prefer 
 constraint for master role
   - others: refused
 
 crm_resource --move-cleanup --resource myresource
   - zap constraints
 
 As we are already short on meaningful single letter options I vote for long 
 options only.
 
 Backwards Compatibility:
 
 
 crm_resource {-M|--move} --resource myresource
   - output deprecation warning
   - treat as crm_resource --move-from --resource myresource
 
 crm_resource {-M|--move} --resource myresource --node mynode
   - output deprecation warning
   - treat as crm_resource --move-to --resource myresource --node mynode
 
 crm_resource {-U|--unmove} --resource myresource
   - output deprecation warning
   - treat as crm_resource --move-cleanup --resource myresource

All looks fine to me.

 For the shell:
 Should we go for similar commands or keep migrate-XXX

migrate is a bit of a misnomer, could be confused with the
migrate operation. I'd vote to leave old migrate/unmigrate
as deprecated and introduce just move-from/to/cleanup variants.

 Coming back to Dejan's proposal of move-remove:
 
 That can be implemented of reexecuting the last move (a remove).
 Reimplemeting unmove as undo of the last move you have shortcuts for your 
 favorite move operation
 
 move
 move-unmove - back
 move-remove - and forth

Well, how about remove-move? ;-)

Cheers,

Dejan

 Just kidding...
  
 
  
   - holger
  
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: 

Re: [Pacemaker] How to prevent locked I/O using Pacemaker with Primary/Primary DRBD/OCFS2 (Ubuntu 10.10)

2011-04-07 Thread Mike Reid
Lars,

Interesting, I will definitely continue in that direction then. Perhaps I
misunderstood the requirements of STONITH. I understand it to be a form of
³remote reboot/shut down² of sorts, and being that the box was already ³shut
down², I assumed at this stage in my testing that it could not be related to
STONITH since the box was confirmed to be down. Perhaps Pacemaker is just
awaiting that confirmation as you suggest, so thank you, I will see if that
is indeed the case. I¹ve seen quite a few stonith operation options
available, is there any one of them that is better suited for a simple
two-node cluster (OCFS2)?


 Message: 1
 Date: Thu, 7 Apr 2011 02:50:09 +0200
 From: Lars Ellenberg lars.ellenb...@linbit.com
 To: pacemaker@oss.clusterlabs.org
 Subject: Re: [Pacemaker] How to prevent locked I/O using Pacemaker
 with Primary/Primary DRBD/OCFS2 (Ubuntu 10.10)
 Message-ID: 20110407005009.GF3726@barkeeper1-xen.linbit
 Content-Type: text/plain; charset=iso-8859-1
 
 On Wed, Apr 06, 2011 at 10:26:24AM -0600, Reid, Mike wrote:
 Lars,
 
 Thank you for your comments. I did confirm I was running 8.3.8.1, and I have
 even upgraded to 8.3.10 but am still experiencing the same I/O lock issue. I
 definitely agree with you, DRBD is behaving exactly as instructed, being
 properly fenced, etc.
 
 I am quite new to DRBD (and OCFS2), learning a lot as I go. To your
 question regarding copy/paste, yes, the configuration used was
 culminated from a series of different tutorials, plus personal trial
 and error related to this project. I have tried many variations of the
 DRBD config (including resource-and-stonith)
 
 but have not actually set up a functioning STONITH yet,
 
 And that's why your ocfs2 does not unblock.
 It waits for confirmation of a STONITH operation.
 
 hence the
 resource-only. The  Linbit
 docs have been an amazing resource.
 
 Yes, I realize that a Secondary-node is not indicative of it's
 data/synch state. The options I am testing here were referenced from
 this page:
 
 
 
 http://www.drbd.org/users-guide/s-ocfs2-create-resource.html
 http://www.drbd.org/users-guide/s-configure-split-brain-behavior.html#s-autom
 atic-split-brain-recovery-configuration
 
 
 
 When you say You do configure automatic data loss here, are you
 suggesting that I am instructing DRBD survivor to perform a full
 re-synch to it's peer?
 
 Nothing to do with full sync. Should usually be a bitmap based resync.
 
 But it may be a sync in an unexpected direction.
 
 If so, that would make sense since I believe
 this behavior was something I experienced prior to getting fencing
 fully established. In my hard-boot testing, I did once notice the
 victim was completely resynching, which sounds related to
 after-sb-1pri discard-secondary.
 
 DRBD aside, have you used OCFS2? I'm failing to realize why if DRBD is
 fencing it's peer that OCFS2 remains in a locked-state, unable to run
 standalone? To me, this issue does not seem related to DRBD or Pacemaker, but
 rather a lower-level requirement of OCFS2 (DLM?), etc.
 
 To date, the ONLY way I can restore I/O to the remaining node is to bring the
 other node back online, which unfortunately won't work in our Production
 environment. On a separate ML, someone made a suggestion that qdisk might
 be required to make this work, and while I have tried qdisk, my high-level
 research leads me to believe that is a legacy approach, not an option with
 Pacemaker.  Is that correct?
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
 
 
 -- 
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com
 
 DRBD? and LINBIT? are registered trademarks of LINBIT, Austria.
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker