Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-11 Thread Andrew Beekhof

On 4/11/07, Alan Robertson [EMAIL PROTECTED] wrote:

Lars Marowsky-Bree wrote:
 On 2007-04-10T07:09:44, Alan Robertson [EMAIL PROTECTED] wrote:

 As even calling crm_master and having it do a compare and
 update-if-modified, or filtering it in the CIB directly requires to at
 least contact and query the CIB, I'd probably still track the state in
 the RA somewhere. (As to avoid forking and IPC.)
 Keeping track of it in the RA would typically involve extra forking to
 do the query, and comparision, and also to manage the state in tmp
 files, etc.

 Uhm. Forking?

 echo, read, and even sourcing the file (if it's written in VAR=VALUE
 style) doesn't incur that overhead.

 Anyway, Andrew fixed it in the CIB by now, so this part of the
 discussion is mood ;-)

 Or even better, monitoring drbd via a daemon which sends an async
 notification (either a crm_master change or a async failure
 notification) when something happens, instead of polling via the LRM.

 I'd wish to have a fast LRM interface, where the instantiation of a
 resource starts a sub-daemon to control it and then manage it via IPC
 (or maybe stdin/stdout) with that process - and, if the daemon support
 it, have it do async monitoring as well.
 Invoking the LRM to update the CIB isn't more efficient than just
 updating the CIB.

 That is not what this suggestion is about. Try reparsing it ;-)

 Such a daemon would obviously easily keep internal state and only issue
 CIB updates or async (failure) notifications as needed. It'd avoid a lot
 of forking for various operations.

You can already do that.  Resources often have processes associated with
them.  They can update the CIB any time they like.  What those processes
do, we don't know, and we don't much care.

Async LRM failure notifications (as failed async monitors) are not there
yet, but there was something about this in the previous discussion that
I've forgotten :-(.


Actually if you look at CTS, you'll find that ResourceRecover uses
a-sync notifications (albeit with a couple of tricks to work around
deficiencies in the lrm)
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-11 Thread Lars Marowsky-Bree
On 2007-04-10T16:47:52, Alan Robertson [EMAIL PROTECTED] wrote:

  Such a daemon would obviously easily keep internal state and only issue
  CIB updates or async (failure) notifications as needed. It'd avoid a lot
  of forking for various operations.
 
 You can already do that.  Resources often have processes associated with
 them.  They can update the CIB any time they like.  What those processes
 do, we don't know, and we don't much care.

No, I cannot do what I described. The fast LRM interface isn't there
yet - what you describe above is only a subset of the feature I want.

 Async LRM failure notifications (as failed async monitors) are not there
 yet, but there was something about this in the previous discussion that
 I've forgotten :-(.

I thought they worked by now?


Sincerely,
Lars

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-11 Thread Lars Marowsky-Bree
On 2007-04-11T09:40:52, Andrew Beekhof [EMAIL PROTECTED] wrote:

 Actually if you look at CTS, you'll find that ResourceRecover uses
 a-sync notifications (albeit with a couple of tricks to work around
 deficiencies in the lrm)

Have bugs been filed for those deficiencies so that Dejan doesn't have
to get bored? ;-)


Regards,
Lars

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: crm_master patch to eliminate do-nothing attribute updates - WAS Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-10 Thread Andrew Beekhof

On Apr 5, 2007, at 4:48 PM, Alan Robertson wrote:


Alan Robertson wrote:

Lars Marowsky-Bree wrote:

On 2007-04-04T11:41:44, Doug Knight [EMAIL PROTECTED] wrote:

The key word in my question was thinks. It would be useful to  
the RA
if it could know what state the CRM thought it was in, so in  
case the RA
determines on its own that its already in that state, it doesn't  
have to
do anything. But, if the RA finds that the CRM thinks its in a  
different
state, then the RA could set the CRM straight by calling the  
crm_master

with the appropriate value. Make sense?
No. The state the resource is in is not set via crm_master, but  
using

the exit code of the monitor operation.

You should only call crm_master when you wish to change the  
_preference_

for master-state.


But, I think you can use crm_master to retrieve your current  
preference,

and thereby eliminate unnecessary CIB updates.

Or maybe crm_master should do that filtering on its own??


Attached is a straightforward patch to crm_attribute.c which I believe
does this...

It also eliminates certain other do-nothing attribute changes -
because this code is shared by a few different commands.

I recognize that this is slightly less efficient for those cases where
the attribute is actually going to be changed, but it is _vastly_ more
efficient for those cases where the current value is correct - because
it eliminates triggering a computation cycle of the policy engine.


I'm not sure how I missed this previously, but this statement is just  
not true.


If the only change to the CIB is to num_updates, then the PE is  
never re-invoked.

This has been the case for about as long as I can remember.

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: crm_master patch to eliminate do-nothing attribute updates - WAS Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-10 Thread Alan Robertson
Andrew Beekhof wrote:
 On Apr 5, 2007, at 4:48 PM, Alan Robertson wrote:
 
 Alan Robertson wrote:
 Lars Marowsky-Bree wrote:
 On 2007-04-04T11:41:44, Doug Knight [EMAIL PROTECTED] wrote:

 The key word in my question was thinks. It would be useful to the RA
 if it could know what state the CRM thought it was in, so in case
 the RA
 determines on its own that its already in that state, it doesn't
 have to
 do anything. But, if the RA finds that the CRM thinks its in a
 different
 state, then the RA could set the CRM straight by calling the
 crm_master
 with the appropriate value. Make sense?
 No. The state the resource is in is not set via crm_master, but using
 the exit code of the monitor operation.

 You should only call crm_master when you wish to change the
 _preference_
 for master-state.

 But, I think you can use crm_master to retrieve your current preference,
 and thereby eliminate unnecessary CIB updates.

 Or maybe crm_master should do that filtering on its own??

 Attached is a straightforward patch to crm_attribute.c which I believe
 does this...

 It also eliminates certain other do-nothing attribute changes -
 because this code is shared by a few different commands.

 I recognize that this is slightly less efficient for those cases where
 the attribute is actually going to be changed, but it is _vastly_ more
 efficient for those cases where the current value is correct - because
 it eliminates triggering a computation cycle of the policy engine.
 
 I'm not sure how I missed this previously, but this statement is just
 not true.
 
 If the only change to the CIB is to num_updates, then the PE is never
 re-invoked.
 This has been the case for about as long as I can remember.

This whole discussion started with evidence that this wasn't happening.
 I didn't know it was _supposed_ to happen, so I didn't think it odd.


-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-10 Thread Alan Robertson
Lars Marowsky-Bree wrote:
 On 2007-04-05T07:40:34, Alan Robertson [EMAIL PROTECTED] wrote:
 
 That is why I'd suggest to only call it in start or post-notify; calling
 it in post-notify basically implies it'll be called after every state
 change.
 But, for DRBD for example, the ability to become master can change
 without a heartbeat state change.
 
 I didn't say it was perfect ;-)
 
 As even calling crm_master and having it do a compare and
 update-if-modified, or filtering it in the CIB directly requires to at
 least contact and query the CIB, I'd probably still track the state in
 the RA somewhere. (As to avoid forking and IPC.)

Keeping track of it in the RA would typically involve extra forking to
do the query, and comparision, and also to manage the state in tmp
files, etc.

 Or even better, monitoring drbd via a daemon which sends an async
 notification (either a crm_master change or a async failure
 notification) when something happens, instead of polling via the LRM.
 
 I'd wish to have a fast LRM interface, where the instantiation of a
 resource starts a sub-daemon to control it and then manage it via IPC
 (or maybe stdin/stdout) with that process - and, if the daemon support
 it, have it do async monitoring as well. 

Invoking the LRM to update the CIB isn't more efficient than just
updating the CIB.



-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-10 Thread Lars Marowsky-Bree
On 2007-04-10T07:09:44, Alan Robertson [EMAIL PROTECTED] wrote:

  As even calling crm_master and having it do a compare and
  update-if-modified, or filtering it in the CIB directly requires to at
  least contact and query the CIB, I'd probably still track the state in
  the RA somewhere. (As to avoid forking and IPC.)
 Keeping track of it in the RA would typically involve extra forking to
 do the query, and comparision, and also to manage the state in tmp
 files, etc.

Uhm. Forking?

echo, read, and even sourcing the file (if it's written in VAR=VALUE
style) doesn't incur that overhead.

Anyway, Andrew fixed it in the CIB by now, so this part of the
discussion is mood ;-)

  Or even better, monitoring drbd via a daemon which sends an async
  notification (either a crm_master change or a async failure
  notification) when something happens, instead of polling via the LRM.
  
  I'd wish to have a fast LRM interface, where the instantiation of a
  resource starts a sub-daemon to control it and then manage it via IPC
  (or maybe stdin/stdout) with that process - and, if the daemon support
  it, have it do async monitoring as well. 
 
 Invoking the LRM to update the CIB isn't more efficient than just
 updating the CIB.

That is not what this suggestion is about. Try reparsing it ;-)

Such a daemon would obviously easily keep internal state and only issue
CIB updates or async (failure) notifications as needed. It'd avoid a lot
of forking for various operations.



Sincerely,
Lars

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-10 Thread Alan Robertson
Lars Marowsky-Bree wrote:
 On 2007-04-10T07:09:44, Alan Robertson [EMAIL PROTECTED] wrote:
 
 As even calling crm_master and having it do a compare and
 update-if-modified, or filtering it in the CIB directly requires to at
 least contact and query the CIB, I'd probably still track the state in
 the RA somewhere. (As to avoid forking and IPC.)
 Keeping track of it in the RA would typically involve extra forking to
 do the query, and comparision, and also to manage the state in tmp
 files, etc.
 
 Uhm. Forking?
 
 echo, read, and even sourcing the file (if it's written in VAR=VALUE
 style) doesn't incur that overhead.
 
 Anyway, Andrew fixed it in the CIB by now, so this part of the
 discussion is mood ;-)
 
 Or even better, monitoring drbd via a daemon which sends an async
 notification (either a crm_master change or a async failure
 notification) when something happens, instead of polling via the LRM.

 I'd wish to have a fast LRM interface, where the instantiation of a
 resource starts a sub-daemon to control it and then manage it via IPC
 (or maybe stdin/stdout) with that process - and, if the daemon support
 it, have it do async monitoring as well. 
 Invoking the LRM to update the CIB isn't more efficient than just
 updating the CIB.
 
 That is not what this suggestion is about. Try reparsing it ;-)
 
 Such a daemon would obviously easily keep internal state and only issue
 CIB updates or async (failure) notifications as needed. It'd avoid a lot
 of forking for various operations.

You can already do that.  Resources often have processes associated with
them.  They can update the CIB any time they like.  What those processes
do, we don't know, and we don't much care.

Async LRM failure notifications (as failed async monitors) are not there
yet, but there was something about this in the previous discussion that
I've forgotten :-(.

-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-07 Thread Lars Marowsky-Bree
On 2007-04-05T07:40:34, Alan Robertson [EMAIL PROTECTED] wrote:

  That is why I'd suggest to only call it in start or post-notify; calling
  it in post-notify basically implies it'll be called after every state
  change.
 But, for DRBD for example, the ability to become master can change
 without a heartbeat state change.

I didn't say it was perfect ;-)

As even calling crm_master and having it do a compare and
update-if-modified, or filtering it in the CIB directly requires to at
least contact and query the CIB, I'd probably still track the state in
the RA somewhere. (As to avoid forking and IPC.)

Or even better, monitoring drbd via a daemon which sends an async
notification (either a crm_master change or a async failure
notification) when something happens, instead of polling via the LRM.

I'd wish to have a fast LRM interface, where the instantiation of a
resource starts a sub-daemon to control it and then manage it via IPC
(or maybe stdin/stdout) with that process - and, if the daemon support
it, have it do async monitoring as well. 

Hmmm. I think we have a bugzilla for that since ages, and now we have a
full-time LRM maintainer again as well ;-) Among with the
LRM-needs-to-track-timestamps, this is probably one of the most
important features I'm missing in the LRM ...

(The CIB modification to filter out unnecessary changes is of course
good in any case.)


Sincerely,
Lars

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-06 Thread Doug Knight
Hi Alan,
Interesting. This entry in the cib was completely created from the GUI.
I modified the Transition Timeout from the GUI to get around the timeout
of my notify function call. So if I understand what you're saying, I
should remove the tags from the op definition and create nvpairs to
replace them. 

By the way, I checked out the link you sent with the GUI information.
Unfortunately the system I use for the web and email is also the server
on which we run our apps, and it has no sound ability at all (I'm
probably lucky its not headless ;)  I'll check your page out over the
weekend from my laptop.

Doug

On Thu, 2007-04-05 at 12:36 -0600, Alan Robertson wrote:

 Doug Knight wrote:
  Here's another thing I'm seeing with the notify function. It keeps
  timing out on my slave startup (post-promote-notify and
  post-start-notify). I'm triggering the start by using the cleanup
  resources button on the GUI for the slave's resource:
  
  crmd[13681]: 2007/04/05_13:16:47 ERROR: process_lrm_event: LRM operation
  rsc_pgsql_wal_5556:0_notify_0 (74) Timed Out (timeout=5000ms)
  
  Yet I have the timeout value for the notify function bumped up to 120:
  
  op id=1e9ebbb8-a370-4d1e-8815-43d6f0fdcfd6 name=notify
  timeout=120 start_delay=0 disabled=false role=
  Started/
 
 I believe this is incorrect.
 The timeout, start_delay and role are meta-attributes which need
 nvpairs.  (The gui is confused about role - it thinks its a parameter).
 
 
 
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-06 Thread Alan Robertson
Doug Knight wrote:
 Hi Alan,
 Interesting. This entry in the cib was completely created from the GUI.
 I modified the Transition Timeout from the GUI to get around the timeout
 of my notify function call. So if I understand what you're saying, I
 should remove the tags from the op definition and create nvpairs to
 replace them. 

This is described in my tutorial and also in the DTD.

 By the way, I checked out the link you sent with the GUI information.
 Unfortunately the system I use for the web and email is also the server
 on which we run our apps, and it has no sound ability at all (I'm
 probably lucky its not headless ;)  I'll check your page out over the
 weekend from my laptop.

What version of the GUI created it that way?

I know some versions create it as a parameter with nvpairs, but I didn't
know any created it as tags in the way you showed...




-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-06 Thread Alan Robertson
Alan Robertson wrote:
 Doug Knight wrote:
 Hi Alan,
 Interesting. This entry in the cib was completely created from the GUI.
 I modified the Transition Timeout from the GUI to get around the timeout
 of my notify function call. So if I understand what you're saying, I
 should remove the tags from the op definition and create nvpairs to
 replace them. 
 
 This is described in my tutorial and also in the DTD.

I meant the first tutorial on this page:
http://linux-ha.org/HeartbeatTutorials


-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-06 Thread Doug Knight
I've asked this question before, but have not gotten an answer:

Can you have a location constraint to set the resource to prefer running
the master on one node vs the other (and still have the slave running
too)?


On Fri, 2007-04-06 at 09:46 -0400, Doug Knight wrote:

 Not sure how to determine the version of the GUI. It came with the
 2.0.8 release. The prolog says copyright 2005. I've looked at the DTD
 and its not always clear what its saying, but it helps. What's really
 helped is stealing the do_cmd function from the drbd OCF script, and
 placing a few targeted env commands in some of my functions. Though
 all the info I gather this way is also on the web site, for me I
 understand it better when I see it in action. 
 
 I did see and review the tutorial you referred to in your other email.
 I checked it out prior to getting started with Linux-HA. Maybe its
 time I revisited it ;)
 
 Doug
 
 On Fri, 2007-04-06 at 07:32 -0600, Alan Robertson wrote: 
 
  Doug Knight wrote:
   Hi Alan,
   Interesting. This entry in the cib was completely created from the GUI.
   I modified the Transition Timeout from the GUI to get around the timeout
   of my notify function call. So if I understand what you're saying, I
   should remove the tags from the op definition and create nvpairs to
   replace them. 
  
  This is described in my tutorial and also in the DTD.
  
   By the way, I checked out the link you sent with the GUI information.
   Unfortunately the system I use for the web and email is also the server
   on which we run our apps, and it has no sound ability at all (I'm
   probably lucky its not headless ;)  I'll check your page out over the
   weekend from my laptop.
  
  What version of the GUI created it that way?
  
  I know some versions create it as a parameter with nvpairs, but I didn't
  know any created it as tags in the way you showed...
  
  
  
  
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-06 Thread Knight, Doug
OK, here's another strange happening, I had an IPaddr resource with a
separate (non-WAL file forwarding instance) database co-located to it. I
created this early on to get up to speed on the HA basics. It uses the
heartbeat ocf script, as does IPaddr. I used a places constraint with a
uname eq to move it manually between nodes in the cluster. As I was
testing my multistate OCF script for the WAL file forwarding version, I
decided to try moving the IPaddr resource from one node to another (I've
been having some issues in my fail-over testing, so I thought I'd verify
that the IPaddr handling still worked). When the IPaddr moved, it
affected the WAL file forwarding resource as well. I haven't dug into
the logs yet to see what happened, but the question becomes why would
other resources on a node be affected when a change is made to a
completely independent resource? Does it have something to do with
managing everything from the GUI? Or making changes to the cib.xml
(which I only do through the GUI right now)? I've also noticed that the
monitoring of the resources does not appear to be happening. I killed
the master database using the original LSB script and waited for the HA
resource monitoring to pick it up and restart, but it didn't. I'm
probably missing something simple here...

Doug

On Fri, 2007-04-06 at 10:24 -0400, Doug Knight wrote:

 I've asked this question before, but have not gotten an answer:
 
 Can you have a location constraint to set the resource to prefer
 running the master on one node vs the other (and still have the slave
 running too)?
 
 
 On Fri, 2007-04-06 at 09:46 -0400, Doug Knight wrote:
 
  Not sure how to determine the version of the GUI. It came with the
  2.0.8 release. The prolog says copyright 2005. I've looked at the
  DTD and its not always clear what its saying, but it helps. What's
  really helped is stealing the do_cmd function from the drbd OCF
  script, and placing a few targeted env commands in some of my
  functions. Though all the info I gather this way is also on the web
  site, for me I understand it better when I see it in action. 
  
  I did see and review the tutorial you referred to in your other
  email. I checked it out prior to getting started with Linux-HA.
  Maybe its time I revisited it ;)
  
  Doug
  
  On Fri, 2007-04-06 at 07:32 -0600, Alan Robertson wrote:  
  
   Doug Knight wrote:
Hi Alan,
Interesting. This entry in the cib was completely created from the GUI.
I modified the Transition Timeout from the GUI to get around the timeout
of my notify function call. So if I understand what you're saying, I
should remove the tags from the op definition and create nvpairs to
replace them. 
   
   This is described in my tutorial and also in the DTD.
   
By the way, I checked out the link you sent with the GUI information.
Unfortunately the system I use for the web and email is also the server
on which we run our apps, and it has no sound ability at all (I'm
probably lucky its not headless ;)  I'll check your page out over the
weekend from my laptop.
   
   What version of the GUI created it that way?
   
   I know some versions create it as a parameter with nvpairs, but I didn't
   know any created it as tags in the way you showed...
   
   
   
   
  
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-06 Thread Alan Robertson
Doug Knight wrote:
 Not sure how to determine the version of the GUI. It came with the 2.0.8
 release. The prolog says copyright 2005. I've looked at the DTD and its
 not always clear what its saying, but it helps. What's really helped is
 stealing the do_cmd function from the drbd OCF script, and placing a
 few targeted env commands in some of my functions. Though all the info
 I gather this way is also on the web site, for me I understand it better
 when I see it in action.
 
 I did see and review the tutorial you referred to in your other email. I
 checked it out prior to getting started with Linux-HA. Maybe its time I
 revisited it ;)

I'm sure you're fine ;-)

I didn't think the 2.0.8 GUI did that.  (telling the version of
heartbeat is enough).


-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-05 Thread Andrew Beekhof

On 4/4/07, Alan Robertson [EMAIL PROTECTED] wrote:

Doug Knight wrote:
 I think I ran into this exact issue. I was calling crm_master -v 100 to
 upgrade to master status too frequently. The postgres database needs to
 stay in master state once its there, and not transition frequently. Are
 there any other command line tools that could be used to retrieve what
 state CRM thinks the resource is in?

 Doug

 On Tue, 2007-04-03 at 22:05 -0600, Alan Robertson wrote:

 Lars Marowsky-Bree wrote:
 On 2007-03-29T21:31:32, Alan Robertson [EMAIL PROTECTED] wrote:

 THe docs are more specific:
 They state that it is mandatory to call it in the start action.

 That doesn't seem to be true.
 That's a correct observation.

 You could call it in a monitor action quite nicely, or so it seems to me...
 Yes. I'd avoid needless calls to it though as every call triggers a
 transition - so don't just call it in _every_ monitor operation ;-)
 That's a good point.  Unfortunately that means the RA has to keep state
 somewhere, which is a bit of a pain.

There is an area where you can create tmp files for your RA. It gets
cleaned out when heartbeat starts.

It's there for this exact reason.

I think crm_master can read the preference when given the right flag.


-G
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-05 Thread Lars Marowsky-Bree
On 2007-04-03T22:05:24, Alan Robertson [EMAIL PROTECTED] wrote:

  Yes. I'd avoid needless calls to it though as every call triggers a
  transition - so don't just call it in _every_ monitor operation ;-)
 That's a good point.  Unfortunately that means the RA has to keep state
 somewhere, which is a bit of a pain.

That is why I'd suggest to only call it in start or post-notify; calling
it in post-notify basically implies it'll be called after every state
change.


Sincerely,
Lars 

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-05 Thread Lars Marowsky-Bree
On 2007-04-04T11:41:44, Doug Knight [EMAIL PROTECTED] wrote:

 The key word in my question was thinks. It would be useful to the RA
 if it could know what state the CRM thought it was in, so in case the RA
 determines on its own that its already in that state, it doesn't have to
 do anything. But, if the RA finds that the CRM thinks its in a different
 state, then the RA could set the CRM straight by calling the crm_master
 with the appropriate value. Make sense?

No. The state the resource is in is not set via crm_master, but using
the exit code of the monitor operation.

You should only call crm_master when you wish to change the _preference_
for master-state.


Sincerely,
Lars

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-05 Thread Alan Robertson
Lars Marowsky-Bree wrote:
 On 2007-04-04T11:41:44, Doug Knight [EMAIL PROTECTED] wrote:
 
 The key word in my question was thinks. It would be useful to the RA
 if it could know what state the CRM thought it was in, so in case the RA
 determines on its own that its already in that state, it doesn't have to
 do anything. But, if the RA finds that the CRM thinks its in a different
 state, then the RA could set the CRM straight by calling the crm_master
 with the appropriate value. Make sense?
 
 No. The state the resource is in is not set via crm_master, but using
 the exit code of the monitor operation.
 
 You should only call crm_master when you wish to change the _preference_
 for master-state.

But, I think you can use crm_master to retrieve your current preference,
and thereby eliminate unnecessary CIB updates.

Or maybe crm_master should do that filtering on its own??

I like that thought...

-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-05 Thread Alan Robertson
Lars Marowsky-Bree wrote:
 On 2007-04-03T22:05:24, Alan Robertson [EMAIL PROTECTED] wrote:
 
 Yes. I'd avoid needless calls to it though as every call triggers a
 transition - so don't just call it in _every_ monitor operation ;-)
 That's a good point.  Unfortunately that means the RA has to keep state
 somewhere, which is a bit of a pain.
 
 That is why I'd suggest to only call it in start or post-notify; calling
 it in post-notify basically implies it'll be called after every state
 change.

But, for DRBD for example, the ability to become master can change
without a heartbeat state change.

This is not at all surprising, nor is it probably uncommon.

-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-04 Thread Doug Knight
I think I ran into this exact issue. I was calling crm_master -v 100 to
upgrade to master status too frequently. The postgres database needs to
stay in master state once its there, and not transition frequently. Are
there any other command line tools that could be used to retrieve what
state CRM thinks the resource is in?

Doug

On Tue, 2007-04-03 at 22:05 -0600, Alan Robertson wrote:

 Lars Marowsky-Bree wrote:
  On 2007-03-29T21:31:32, Alan Robertson [EMAIL PROTECTED] wrote:
  
  THe docs are more specific:
  They state that it is mandatory to call it in the start action.
 
  That doesn't seem to be true.
  
  That's a correct observation.
  
  You could call it in a monitor action quite nicely, or so it seems to me...
  
  Yes. I'd avoid needless calls to it though as every call triggers a
  transition - so don't just call it in _every_ monitor operation ;-)
 
 That's a good point.  Unfortunately that means the RA has to keep state
 somewhere, which is a bit of a pain.
 
 
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-04 Thread Andrew Beekhof

On 4/3/07, Doug Knight [EMAIL PROTECTED] wrote:


 Thanks. Yes, I'm concerned that I'm seeing the spinning and your not.
However, I need to push through to getting my OCF script working, and using
my version of crm_master will allow me to do that. I'll have to take the
added demote into consideration, as when executing a straight stop this will
interfere with an intentional shutdown of the master. I found part of my
problem with the slave not starting. I had to change clone_max from 1 to 2.
Now at least its trying to start the slave, but I'm getting a strange error
in the log when trying to start the slave:

 ERROR: find_attr_details: Multiple attributes match
name=master-pgsql_wal_5556:0 in nodes:


this is because you inverted the searching


 I've attached the cib.xml  from the slave node as well. It looks like my
experimenting with my scripts manually has placed some spurious nvpairs in
the xml? Suggestions? Also, can you have a location constraint to set the
resource to prefer running the master on one node vs the other?

 Doug



 On Tue, 2007-04-03 at 17:50 +0200, Andrew Beekhof wrote:
 On 4/3/07, Doug Knight [EMAIL PROTECTED] wrote:

 I've never done a bugzilla report, maybe you can point me in the right
 direction.

http://old.linux-foundation.org/developer_bugzilla/enter_bug.cgi

 In the mean time I'll continue to test with my modified version of
 crm_master that doesn't spin.

my concerns are that a) it works for us as-is, and b) it now works for
you but for unknown reasons.

 I've managed to get the master side of the
 process configuration up, but am having trouble getting the slave side up.
I
 did notice that when I select Stop from the GUI to completely stop the
 master, it demotes it first then stop it? The demote is what I use to
 reconfigure a master going down to become the slave after the new master
 comes up. Is there a way in demote to determine its been triggered by a
Stop
 request vs a Demote?

if you're a master, you'll always get a demote before a stop.


 Doug


 On Tue, 2007-04-03 at 12:28 +0200, Andrew Beekhof wrote:
 Hi Doug,

 I just tried to reproduce this (loaded your cib and ran crm_master the
 same way you did) but it worked fine.

 Can you open a bug in bugzilla and include the result of cibadmin -Q
 please?

 On 4/2/07, Doug Knight [EMAIL PROTECTED] wrote:
 
  Andrew, Alan, Lars, et al,
  Any updates on the spinning crm_master? I and an associate of mine here
 are
  looking into compiler flags and settings to see if we can find anything
  there. If there is more info you need, just let me know.
 
  Doug
 
 
 
  On Fri, 2007-03-30 at 11:15 -0400, Doug Knight wrote:
 
  I've done some more looking at the cib_attrs.c module find_attr_details
  function, and it seems that the call to find_xml_children filtering by
 node
  eventually finds a match (match_found = 1 in xml.c/find_xml_children)
  terminating the recursive search. Later in the module, the call to
  find_xml_children filtering by name gets a match but continues to
search.
 I
  never see the search by set: in the debug log, nor the printf I've
 placed
  in my version right after the call to find_xml_children. In fact, it
seems
  that somehow find_xml_children loops within itself, though I must really
 be
  missing something there.
 
  Doug
 
  On Fri, 2007-03-30 at 08:20 -0400, Doug Knight wrote:
 
  Andrew, FYI, the one I ran has some debug statements in it that I put in
  there. Let me know if you want cleaner output...
 
  Doug
  On Fri, 2007-03-30 at 07:44 -0400, Doug Knight wrote:
 
  Hi Andrew,
  I'll get to this today. I've been digging through some of the source,
  putting in some additional logging, etc, just to see where the problem
  occurs.
 
  Thanks,
  Doug
 
  On Fri, 2007-03-30 at 10:32 +0200, Andrew Beekhof wrote:
  On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote:
  
   Results: Using the dev version or crm_master I still spin at the top
of
  the
   CPU stack. What's next?
 
  probably time to log a bug...
 
  can you include the logs of the crm_master command when you add
  -
  (an insance amount of logging) to your normal command line
 
  and your current CIB please
 
  
   Doug
  
  
   On Thu, 2007-03-29 at 10:06 -0400, Doug Knight wrote:
  
   OK, ConfigureMe make is complete. As when I built the baseline 2.0.8,
I
  had
   two compile errors, and had to make some minor source code mods (see
   attached error message and file diffs). Not sure if this is related or
  not.
   I've done a make install in the crm/admin directory to replace the
   crm_master used by the system. I'll test that out in a bit.
  
   Doug
  
   On Thu, 2007-03-29 at 15:38 +0200, Andrew Beekhof wrote:
   On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote:
   
I went ahead and installed via RPM automake, autoconf, and libtool,
 even
though they were not needed for the 2.0.8 baseline build (I believe
   libtool
at least may have been installed previously). ConfigureMe bootstrap
 ran
fine. From the 

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-04 Thread Doug Knight
The key word in my question was thinks. It would be useful to the RA
if it could know what state the CRM thought it was in, so in case the RA
determines on its own that its already in that state, it doesn't have to
do anything. But, if the RA finds that the CRM thinks its in a different
state, then the RA could set the CRM straight by calling the crm_master
with the appropriate value. Make sense?

Doug

On Wed, 2007-04-04 at 16:02 +0200, Andrew Beekhof wrote:

 On 4/4/07, Doug Knight [EMAIL PROTECTED] wrote:
 
   I think I ran into this exact issue. I was calling crm_master -v 100 to
  upgrade to master status too frequently. The postgres database needs to stay
  in master state once its there, and not transition frequently. Are there any
  other command line tools that could be used to retrieve what state CRM
  thinks the resource is in?
 
 definitly not - because that completely defeats the point of having a
 monitor operation
 
 the RA needs to tell us what state the resource is in, never, ever the
 other way around.
 
 
   Doug
 
   On Tue, 2007-04-03 at 22:05 -0600, Alan Robertson wrote:
   Lars Marowsky-Bree wrote:
   On 2007-03-29T21:31:32, Alan Robertson [EMAIL PROTECTED] wrote:
  
   THe docs are more specific:
   They state that it is mandatory to call it in the start action.
  
   That doesn't seem to be true.
  
   That's a correct observation.
  
   You could call it in a monitor action quite nicely, or so it seems to
  me...
  
   Yes. I'd avoid needless calls to it though as every call triggers a
   transition - so don't just call it in _every_ monitor operation ;-)
 
  That's a good point. Unfortunately that means the RA has to keep state
  somewhere, which is a bit of a pain.
 
 
 
 
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
 
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-04 Thread Alan Robertson
Doug Knight wrote:
 I think I ran into this exact issue. I was calling crm_master -v 100 to
 upgrade to master status too frequently. The postgres database needs to
 stay in master state once its there, and not transition frequently. Are
 there any other command line tools that could be used to retrieve what
 state CRM thinks the resource is in?
 
 Doug
 
 On Tue, 2007-04-03 at 22:05 -0600, Alan Robertson wrote:
 
 Lars Marowsky-Bree wrote:
 On 2007-03-29T21:31:32, Alan Robertson [EMAIL PROTECTED] wrote:

 THe docs are more specific:
 They state that it is mandatory to call it in the start action.

 That doesn't seem to be true.
 That's a correct observation.

 You could call it in a monitor action quite nicely, or so it seems to me...
 Yes. I'd avoid needless calls to it though as every call triggers a
 transition - so don't just call it in _every_ monitor operation ;-)
 That's a good point.  Unfortunately that means the RA has to keep state
 somewhere, which is a bit of a pain.

There is an area where you can create tmp files for your RA. It gets
cleaned out when heartbeat starts.

It's there for this exact reason.

I think crm_master can read the preference when given the right flag.

-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-03 Thread Andrew Beekhof

Hi Doug,

I just tried to reproduce this (loaded your cib and ran crm_master the
same way you did) but it worked fine.

Can you open a bug in bugzilla and include the result of cibadmin -Q please?

On 4/2/07, Doug Knight [EMAIL PROTECTED] wrote:


 Andrew, Alan, Lars, et al,
 Any updates on the spinning crm_master? I and an associate of mine here are
looking into compiler flags and settings to see if we can find anything
there. If there is more info you need, just let me know.

 Doug



 On Fri, 2007-03-30 at 11:15 -0400, Doug Knight wrote:

 I've done some more looking at the cib_attrs.c module find_attr_details
function, and it seems that the call to find_xml_children filtering by node
eventually finds a match (match_found = 1 in xml.c/find_xml_children)
terminating the recursive search. Later in the module, the call to
find_xml_children filtering by name gets a match but continues to search. I
never see the search by set: in the debug log, nor the printf I've placed
in my version right after the call to find_xml_children. In fact, it seems
that somehow find_xml_children loops within itself, though I must really be
missing something there.

 Doug

 On Fri, 2007-03-30 at 08:20 -0400, Doug Knight wrote:

 Andrew, FYI, the one I ran has some debug statements in it that I put in
there. Let me know if you want cleaner output...

 Doug
 On Fri, 2007-03-30 at 07:44 -0400, Doug Knight wrote:

 Hi Andrew,
 I'll get to this today. I've been digging through some of the source,
putting in some additional logging, etc, just to see where the problem
occurs.

 Thanks,
 Doug

 On Fri, 2007-03-30 at 10:32 +0200, Andrew Beekhof wrote:
 On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote:

 Results: Using the dev version or crm_master I still spin at the top of
the
 CPU stack. What's next?

probably time to log a bug...

can you include the logs of the crm_master command when you add
 -
(an insance amount of logging) to your normal command line

and your current CIB please


 Doug


 On Thu, 2007-03-29 at 10:06 -0400, Doug Knight wrote:

 OK, ConfigureMe make is complete. As when I built the baseline 2.0.8, I
had
 two compile errors, and had to make some minor source code mods (see
 attached error message and file diffs). Not sure if this is related or
not.
 I've done a make install in the crm/admin directory to replace the
 crm_master used by the system. I'll test that out in a bit.

 Doug

 On Thu, 2007-03-29 at 15:38 +0200, Andrew Beekhof wrote:
 On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote:
 
  I went ahead and installed via RPM automake, autoconf, and libtool, even
  though they were not needed for the 2.0.8 baseline build (I believe
 libtool
  at least may have been installed previously). ConfigureMe bootstrap ran
  fine. From the output I gather it includes the equivalent of the
 ConfigureMe
  configure run, and now I just need to do the make?

 right

 
  Doug
 
  On Thu, 2007-03-29 at 09:23 -0400, Doug Knight wrote:
 
  OK, Tried that, no luck. It still complains about libtool, autoconf, and
  automake. When I copy over the same basic files from the 2.0.8
directory,
  bootstrap still does not work, but ConfigureMe configure does (at least
to
  the point where it starts looking for the Makefile.in files).
 
  Doug
 
  On Thu, 2007-03-29 at 09:50 +0200, Andrew Beekhof wrote:
  pretty sure you need:
  ./ConfigureMe bootstrap
 
  On 3/28/07, Doug Knight [EMAIL PROTECTED] wrote:
  
   Alan and Lars,
   After much effort, I have had no success in building from the
mercurial
   version. Here's what I tried, and since this is the first time I've
 tried
  to
   build a dev version, maybe you can see where I'm going wrong:
  
   Downloaded tar from hg.linux-ha.org/dev
   Unpacked it in a subdirectory to my root account, HA
   Attempted a quick ConfigureMe configure, got errors that it couldn't
 find
   libtool, automake, autoconf
   Did a side-by-side comparison to the HA 2.0.8 I built successfully and
 am
   running with, found no libltdl.tar or libltdl directory under the dev
   version. Copied these from my original stable release tar into the
  directory
   structure for the dev version.
   Ran ConfigureMe configure, which then complained about all the
 Makefile.in
   files missing. Copied those over as well from the side-by-side. Also
  pulled
   include/ha_config.h.in and linux-ha/config.h.in since it complained
 about
   those missing too.
   ConfigureMe configure runs to completion
   ConfigureMe make exits with the following:
  
   In file included from base64.c:18:
   ../../include/heartbeat.h:38:23: error: hb_config.h:
No
   such file or directory
   ../../include/heartbeat.h:98:1: error: HB_RC_DIR redefined
   In file included from
../../include/lha_internal.h:37,
   from base64.c:17:
   ../../linux-ha/config.h:504:1: error: this is the location of the
 previous
   definition
   In file included from base64.c:18:
   ../../include/heartbeat.h:102:1: error: HALIB
 redefined
   command line:1:1: error: 

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-03 Thread Doug Knight
I've never done a bugzilla report, maybe you can point me in the right
direction.

In the mean time I'll continue to test with my modified version of
crm_master that doesn't spin. I've managed to get the master side of the
process configuration up, but am having trouble getting the slave side
up. I did notice that when I select Stop from the GUI to completely stop
the master, it demotes it first then stop it? The demote is what I use
to reconfigure a master going down to become the slave after the new
master comes up. Is there a way in demote to determine its been
triggered by a Stop request vs a Demote?

Doug

On Tue, 2007-04-03 at 12:28 +0200, Andrew Beekhof wrote:

 Hi Doug,
 
 I just tried to reproduce this (loaded your cib and ran crm_master the
 same way you did) but it worked fine.
 
 Can you open a bug in bugzilla and include the result of cibadmin -Q please?
 
 On 4/2/07, Doug Knight [EMAIL PROTECTED] wrote:
 
   Andrew, Alan, Lars, et al,
   Any updates on the spinning crm_master? I and an associate of mine here are
  looking into compiler flags and settings to see if we can find anything
  there. If there is more info you need, just let me know.
 
   Doug
 
 
 
   On Fri, 2007-03-30 at 11:15 -0400, Doug Knight wrote:
 
   I've done some more looking at the cib_attrs.c module find_attr_details
  function, and it seems that the call to find_xml_children filtering by node
  eventually finds a match (match_found = 1 in xml.c/find_xml_children)
  terminating the recursive search. Later in the module, the call to
  find_xml_children filtering by name gets a match but continues to search. I
  never see the search by set: in the debug log, nor the printf I've placed
  in my version right after the call to find_xml_children. In fact, it seems
  that somehow find_xml_children loops within itself, though I must really be
  missing something there.
 
   Doug
 
   On Fri, 2007-03-30 at 08:20 -0400, Doug Knight wrote:
 
   Andrew, FYI, the one I ran has some debug statements in it that I put in
  there. Let me know if you want cleaner output...
 
   Doug
   On Fri, 2007-03-30 at 07:44 -0400, Doug Knight wrote:
 
   Hi Andrew,
   I'll get to this today. I've been digging through some of the source,
  putting in some additional logging, etc, just to see where the problem
  occurs.
 
   Thanks,
   Doug
 
   On Fri, 2007-03-30 at 10:32 +0200, Andrew Beekhof wrote:
   On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote:
  
   Results: Using the dev version or crm_master I still spin at the top of
  the
   CPU stack. What's next?
 
  probably time to log a bug...
 
  can you include the logs of the crm_master command when you add
   -
  (an insance amount of logging) to your normal command line
 
  and your current CIB please
 
  
   Doug
  
  
   On Thu, 2007-03-29 at 10:06 -0400, Doug Knight wrote:
  
   OK, ConfigureMe make is complete. As when I built the baseline 2.0.8, I
  had
   two compile errors, and had to make some minor source code mods (see
   attached error message and file diffs). Not sure if this is related or
  not.
   I've done a make install in the crm/admin directory to replace the
   crm_master used by the system. I'll test that out in a bit.
  
   Doug
  
   On Thu, 2007-03-29 at 15:38 +0200, Andrew Beekhof wrote:
   On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote:
   
I went ahead and installed via RPM automake, autoconf, and libtool, even
though they were not needed for the 2.0.8 baseline build (I believe
   libtool
at least may have been installed previously). ConfigureMe bootstrap ran
fine. From the output I gather it includes the equivalent of the
   ConfigureMe
configure run, and now I just need to do the make?
  
   right
  
   
Doug
   
On Thu, 2007-03-29 at 09:23 -0400, Doug Knight wrote:
   
OK, Tried that, no luck. It still complains about libtool, autoconf, and
automake. When I copy over the same basic files from the 2.0.8
  directory,
bootstrap still does not work, but ConfigureMe configure does (at least
  to
the point where it starts looking for the Makefile.in files).
   
Doug
   
On Thu, 2007-03-29 at 09:50 +0200, Andrew Beekhof wrote:
pretty sure you need:
./ConfigureMe bootstrap
   
On 3/28/07, Doug Knight [EMAIL PROTECTED] wrote:

 Alan and Lars,
 After much effort, I have had no success in building from the
  mercurial
 version. Here's what I tried, and since this is the first time I've
   tried
to
 build a dev version, maybe you can see where I'm going wrong:

 Downloaded tar from hg.linux-ha.org/dev
 Unpacked it in a subdirectory to my root account, HA
 Attempted a quick ConfigureMe configure, got errors that it couldn't
   find
 libtool, automake, autoconf
 Did a side-by-side comparison to the HA 2.0.8 I built successfully and
   am
 running with, found no libltdl.tar or libltdl directory under the dev
 version. Copied these from my original 

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-03 Thread Andrew Beekhof

On 4/3/07, Doug Knight [EMAIL PROTECTED] wrote:


 I've never done a bugzilla report, maybe you can point me in the right
direction.


http://old.linux-foundation.org/developer_bugzilla/enter_bug.cgi


 In the mean time I'll continue to test with my modified version of
crm_master that doesn't spin.


my concerns are that a) it works for us as-is, and b) it now works for
you but for unknown reasons.


I've managed to get the master side of the
process configuration up, but am having trouble getting the slave side up. I
did notice that when I select Stop from the GUI to completely stop the
master, it demotes it first then stop it? The demote is what I use to
reconfigure a master going down to become the slave after the new master
comes up. Is there a way in demote to determine its been triggered by a Stop
request vs a Demote?


if you're a master, you'll always get a demote before a stop.



 Doug


 On Tue, 2007-04-03 at 12:28 +0200, Andrew Beekhof wrote:
 Hi Doug,

I just tried to reproduce this (loaded your cib and ran crm_master the
same way you did) but it worked fine.

Can you open a bug in bugzilla and include the result of cibadmin -Q
please?

On 4/2/07, Doug Knight [EMAIL PROTECTED] wrote:

 Andrew, Alan, Lars, et al,
 Any updates on the spinning crm_master? I and an associate of mine here
are
 looking into compiler flags and settings to see if we can find anything
 there. If there is more info you need, just let me know.

 Doug



 On Fri, 2007-03-30 at 11:15 -0400, Doug Knight wrote:

 I've done some more looking at the cib_attrs.c module find_attr_details
 function, and it seems that the call to find_xml_children filtering by
node
 eventually finds a match (match_found = 1 in xml.c/find_xml_children)
 terminating the recursive search. Later in the module, the call to
 find_xml_children filtering by name gets a match but continues to search.
I
 never see the search by set: in the debug log, nor the printf I've
placed
 in my version right after the call to find_xml_children. In fact, it seems
 that somehow find_xml_children loops within itself, though I must really
be
 missing something there.

 Doug

 On Fri, 2007-03-30 at 08:20 -0400, Doug Knight wrote:

 Andrew, FYI, the one I ran has some debug statements in it that I put in
 there. Let me know if you want cleaner output...

 Doug
 On Fri, 2007-03-30 at 07:44 -0400, Doug Knight wrote:

 Hi Andrew,
 I'll get to this today. I've been digging through some of the source,
 putting in some additional logging, etc, just to see where the problem
 occurs.

 Thanks,
 Doug

 On Fri, 2007-03-30 at 10:32 +0200, Andrew Beekhof wrote:
 On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote:
 
  Results: Using the dev version or crm_master I still spin at the top of
 the
  CPU stack. What's next?

 probably time to log a bug...

 can you include the logs of the crm_master command when you add
 -
 (an insance amount of logging) to your normal command line

 and your current CIB please

 
  Doug
 
 
  On Thu, 2007-03-29 at 10:06 -0400, Doug Knight wrote:
 
  OK, ConfigureMe make is complete. As when I built the baseline 2.0.8, I
 had
  two compile errors, and had to make some minor source code mods (see
  attached error message and file diffs). Not sure if this is related or
 not.
  I've done a make install in the crm/admin directory to replace the
  crm_master used by the system. I'll test that out in a bit.
 
  Doug
 
  On Thu, 2007-03-29 at 15:38 +0200, Andrew Beekhof wrote:
  On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote:
  
   I went ahead and installed via RPM automake, autoconf, and libtool,
even
   though they were not needed for the 2.0.8 baseline build (I believe
  libtool
   at least may have been installed previously). ConfigureMe bootstrap
ran
   fine. From the output I gather it includes the equivalent of the
  ConfigureMe
   configure run, and now I just need to do the make?
 
  right
 
  
   Doug
  
   On Thu, 2007-03-29 at 09:23 -0400, Doug Knight wrote:
  
   OK, Tried that, no luck. It still complains about libtool, autoconf,
and
   automake. When I copy over the same basic files from the 2.0.8
 directory,
   bootstrap still does not work, but ConfigureMe configure does (at
least
 to
   the point where it starts looking for the Makefile.in files).
  
   Doug
  
   On Thu, 2007-03-29 at 09:50 +0200, Andrew Beekhof wrote:
   pretty sure you need:
   ./ConfigureMe bootstrap
  
   On 3/28/07, Doug Knight [EMAIL PROTECTED] wrote:
   
Alan and Lars,
After much effort, I have had no success in building from the
 mercurial
version. Here's what I tried, and since this is the first time I've
  tried
   to
build a dev version, maybe you can see where I'm going wrong:
   
Downloaded tar from hg.linux-ha.org/dev
Unpacked it in a subdirectory to my root account, HA
Attempted a quick ConfigureMe configure, got errors that it couldn't
  find
libtool, automake, autoconf
Did a side-by-side comparison to the HA 2.0.8 I 

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-03 Thread Alan Robertson
Lars Marowsky-Bree wrote:
 On 2007-03-29T21:31:32, Alan Robertson [EMAIL PROTECTED] wrote:
 
 THe docs are more specific:
 They state that it is mandatory to call it in the start action.

 That doesn't seem to be true.
 
 That's a correct observation.
 
 You could call it in a monitor action quite nicely, or so it seems to me...
 
 Yes. I'd avoid needless calls to it though as every call triggers a
 transition - so don't just call it in _every_ monitor operation ;-)

That's a good point.  Unfortunately that means the RA has to keep state
somewhere, which is a bit of a pain.


-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-04-02 Thread Doug Knight
Andrew, Alan, Lars, et al,
Any updates on the spinning crm_master? I and an associate of mine here
are looking into compiler flags and settings to see if we can find
anything there. If there is more info you need, just let me know.

Doug


On Fri, 2007-03-30 at 11:15 -0400, Doug Knight wrote:

 I've done some more looking at the cib_attrs.c module
 find_attr_details function, and it seems that the call to
 find_xml_children filtering by node eventually finds a match
 (match_found = 1 in xml.c/find_xml_children) terminating the recursive
 search. Later in the module, the call to find_xml_children filtering
 by name gets a match but continues to search. I never see the search
 by set: in the debug log, nor the printf I've placed in my version
 right after the call to find_xml_children. In fact, it seems that
 somehow find_xml_children loops within itself, though I must really be
 missing something there.
 
 Doug
 
 On Fri, 2007-03-30 at 08:20 -0400, Doug Knight wrote:
 
  Andrew, FYI, the one I ran has some debug statements in it that I
  put in there. Let me know if you want cleaner output...
  
  Doug
  On Fri, 2007-03-30 at 07:44 -0400, Doug Knight wrote:
  
   Hi Andrew,
   I'll get to this today. I've been digging through some of the
   source, putting in some additional logging, etc, just to see where
   the problem occurs. 
   
   Thanks,
   Doug
   
   On Fri, 2007-03-30 at 10:32 +0200, Andrew Beekhof wrote:  
   
On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote:

  Results: Using the dev version or crm_master I still spin at the top 
 of the
 CPU stack. What's next?

probably time to log a bug...

can you include the logs of the crm_master command when you add
   -
(an insance amount of logging) to your normal command line

and your current CIB please


  Doug


  On Thu, 2007-03-29 at 10:06 -0400, Doug Knight wrote:

  OK, ConfigureMe make is complete. As when I built the baseline 
 2.0.8, I had
 two compile errors, and had to make some minor source code mods (see
 attached error message and file diffs). Not sure if this is related 
 or not.
 I've done a make install in the crm/admin directory to replace the
 crm_master used by the system. I'll test that out in a bit.

  Doug

  On Thu, 2007-03-29 at 15:38 +0200, Andrew Beekhof wrote:
  On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote:
 
  I went ahead and installed via RPM automake, autoconf, and libtool, 
  even
  though they were not needed for the 2.0.8 baseline build (I believe
 libtool
  at least may have been installed previously). ConfigureMe bootstrap 
  ran
  fine. From the output I gather it includes the equivalent of the
 ConfigureMe
  configure run, and now I just need to do the make?

 right

 
  Doug
 
  On Thu, 2007-03-29 at 09:23 -0400, Doug Knight wrote:
 
  OK, Tried that, no luck. It still complains about libtool, 
  autoconf, and
  automake. When I copy over the same basic files from the 2.0.8 
  directory,
  bootstrap still does not work, but ConfigureMe configure does (at 
  least to
  the point where it starts looking for the Makefile.in files).
 
  Doug
 
  On Thu, 2007-03-29 at 09:50 +0200, Andrew Beekhof wrote:
  pretty sure you need:
  ./ConfigureMe bootstrap
 
  On 3/28/07, Doug Knight [EMAIL PROTECTED] wrote:
  
   Alan and Lars,
   After much effort, I have had no success in building from the 
   mercurial
   version. Here's what I tried, and since this is the first time 
   I've
 tried
  to
   build a dev version, maybe you can see where I'm going wrong:
  
   Downloaded tar from hg.linux-ha.org/dev
   Unpacked it in a subdirectory to my root account, HA
   Attempted a quick ConfigureMe configure, got errors that it 
   couldn't
 find
   libtool, automake, autoconf
   Did a side-by-side comparison to the HA 2.0.8 I built 
   successfully and
 am
   running with, found no libltdl.tar or libltdl directory under the 
   dev
   version. Copied these from my original stable release tar into the
  directory
   structure for the dev version.
   Ran ConfigureMe configure, which then complained about all the
 Makefile.in
   files missing. Copied those over as well from the side-by-side. 
   Also
  pulled
   include/ha_config.h.in and linux-ha/config.h.in since it 
   complained
 about
   those missing too.
   ConfigureMe configure runs to completion
   ConfigureMe make exits with the following:
  
   In file included from base64.c:18:
   ../../include/heartbeat.h:38:23: error: hb_config.h: No
   such file or directory
   ../../include/heartbeat.h:98:1: error: HB_RC_DIR redefined
   In file included from 

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-30 Thread Doug Knight
Hi Andrew,
I'll get to this today. I've been digging through some of the source,
putting in some additional logging, etc, just to see where the problem
occurs. 

Thanks,
Doug

On Fri, 2007-03-30 at 10:32 +0200, Andrew Beekhof wrote:

 On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote:
 
   Results: Using the dev version or crm_master I still spin at the top of the
  CPU stack. What's next?
 
 probably time to log a bug...
 
 can you include the logs of the crm_master command when you add
-
 (an insance amount of logging) to your normal command line
 
 and your current CIB please
 
 
   Doug
 
 
   On Thu, 2007-03-29 at 10:06 -0400, Doug Knight wrote:
 
   OK, ConfigureMe make is complete. As when I built the baseline 2.0.8, I had
  two compile errors, and had to make some minor source code mods (see
  attached error message and file diffs). Not sure if this is related or not.
  I've done a make install in the crm/admin directory to replace the
  crm_master used by the system. I'll test that out in a bit.
 
   Doug
 
   On Thu, 2007-03-29 at 15:38 +0200, Andrew Beekhof wrote:
   On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote:
  
   I went ahead and installed via RPM automake, autoconf, and libtool, even
   though they were not needed for the 2.0.8 baseline build (I believe
  libtool
   at least may have been installed previously). ConfigureMe bootstrap ran
   fine. From the output I gather it includes the equivalent of the
  ConfigureMe
   configure run, and now I just need to do the make?
 
  right
 
  
   Doug
  
   On Thu, 2007-03-29 at 09:23 -0400, Doug Knight wrote:
  
   OK, Tried that, no luck. It still complains about libtool, autoconf, and
   automake. When I copy over the same basic files from the 2.0.8 directory,
   bootstrap still does not work, but ConfigureMe configure does (at least to
   the point where it starts looking for the Makefile.in files).
  
   Doug
  
   On Thu, 2007-03-29 at 09:50 +0200, Andrew Beekhof wrote:
   pretty sure you need:
   ./ConfigureMe bootstrap
  
   On 3/28/07, Doug Knight [EMAIL PROTECTED] wrote:
   
Alan and Lars,
After much effort, I have had no success in building from the mercurial
version. Here's what I tried, and since this is the first time I've
  tried
   to
build a dev version, maybe you can see where I'm going wrong:
   
Downloaded tar from hg.linux-ha.org/dev
Unpacked it in a subdirectory to my root account, HA
Attempted a quick ConfigureMe configure, got errors that it couldn't
  find
libtool, automake, autoconf
Did a side-by-side comparison to the HA 2.0.8 I built successfully and
  am
running with, found no libltdl.tar or libltdl directory under the dev
version. Copied these from my original stable release tar into the
   directory
structure for the dev version.
Ran ConfigureMe configure, which then complained about all the
  Makefile.in
files missing. Copied those over as well from the side-by-side. Also
   pulled
include/ha_config.h.in and linux-ha/config.h.in since it complained
  about
those missing too.
ConfigureMe configure runs to completion
ConfigureMe make exits with the following:
   
In file included from base64.c:18:
../../include/heartbeat.h:38:23: error: hb_config.h: No
such file or directory
../../include/heartbeat.h:98:1: error: HB_RC_DIR redefined
In file included from ../../include/lha_internal.h:37,
from base64.c:17:
../../linux-ha/config.h:504:1: error: this is the location of the
  previous
definition
In file included from base64.c:18:
../../include/heartbeat.h:102:1: error: HALIB
  redefined
command line:1:1: error: this is the location of the previous
  definition
gmake[2]: *** [base64.lo] Error 1
gmake[2]: Leaving directory
`/root/HA/Heartbeat-Dev-829e377e00bd/lib/clplumbing'
gmake[1]: *** [all-recursive] Error 1
gmake[1]: Leaving directory
`/root/HA/Heartbeat-Dev-829e377e00bd/lib'
gmake: *** [all-recursive] Error 1
   
   
Any ideas?
   
Doug
   
   
   
On Sun, 2007-03-25 at 16:06 -0600, Alan Robertson wrote:
Doug Knight wrote:
 Got it. The attached file contains the strace from the second attempt
  by
 heartbeat to start the resource up as master, right up until it was
 killed. The resource already showed failed on the gui. I zipped it up
 using gzip.
   
Lars asked a good question as well...
   
Could you kindly reproduce this with the current Mercurial tip version?
   
Thanks1
   
   
   
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
   
   
   ___
   Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
   Home Page: http://linux-ha.org/
  
  
   

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-30 Thread Doug Knight
OK, let me try this again. My last email was too big by a little and got
held up for review. I've bzip2'ed this one.

Doug

On Fri, 2007-03-30 at 08:20 -0400, Doug Knight wrote:

 Andrew, FYI, the one I ran has some debug statements in it that I put
 in there. Let me know if you want cleaner output...
 
 Doug
 On Fri, 2007-03-30 at 07:44 -0400, Doug Knight wrote:
 
  Hi Andrew,
  I'll get to this today. I've been digging through some of the
  source, putting in some additional logging, etc, just to see where
  the problem occurs. 
  
  Thanks,
  Doug
  
  On Fri, 2007-03-30 at 10:32 +0200, Andrew Beekhof wrote:  
  
   On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote:
   
 Results: Using the dev version or crm_master I still spin at the top 
of the
CPU stack. What's next?
   
   probably time to log a bug...
   
   can you include the logs of the crm_master command when you add
  -
   (an insance amount of logging) to your normal command line
   
   and your current CIB please
   
   
 Doug
   
   
 On Thu, 2007-03-29 at 10:06 -0400, Doug Knight wrote:
   
 OK, ConfigureMe make is complete. As when I built the baseline 2.0.8, 
I had
two compile errors, and had to make some minor source code mods (see
attached error message and file diffs). Not sure if this is related or 
not.
I've done a make install in the crm/admin directory to replace the
crm_master used by the system. I'll test that out in a bit.
   
 Doug
   
 On Thu, 2007-03-29 at 15:38 +0200, Andrew Beekhof wrote:
 On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote:

 I went ahead and installed via RPM automake, autoconf, and libtool, 
 even
 though they were not needed for the 2.0.8 baseline build (I believe
libtool
 at least may have been installed previously). ConfigureMe bootstrap 
 ran
 fine. From the output I gather it includes the equivalent of the
ConfigureMe
 configure run, and now I just need to do the make?
   
right
   

 Doug

 On Thu, 2007-03-29 at 09:23 -0400, Doug Knight wrote:

 OK, Tried that, no luck. It still complains about libtool, autoconf, 
 and
 automake. When I copy over the same basic files from the 2.0.8 
 directory,
 bootstrap still does not work, but ConfigureMe configure does (at 
 least to
 the point where it starts looking for the Makefile.in files).

 Doug

 On Thu, 2007-03-29 at 09:50 +0200, Andrew Beekhof wrote:
 pretty sure you need:
 ./ConfigureMe bootstrap

 On 3/28/07, Doug Knight [EMAIL PROTECTED] wrote:
 
  Alan and Lars,
  After much effort, I have had no success in building from the 
  mercurial
  version. Here's what I tried, and since this is the first time I've
tried
 to
  build a dev version, maybe you can see where I'm going wrong:
 
  Downloaded tar from hg.linux-ha.org/dev
  Unpacked it in a subdirectory to my root account, HA
  Attempted a quick ConfigureMe configure, got errors that it couldn't
find
  libtool, automake, autoconf
  Did a side-by-side comparison to the HA 2.0.8 I built successfully 
  and
am
  running with, found no libltdl.tar or libltdl directory under the 
  dev
  version. Copied these from my original stable release tar into the
 directory
  structure for the dev version.
  Ran ConfigureMe configure, which then complained about all the
Makefile.in
  files missing. Copied those over as well from the side-by-side. Also
 pulled
  include/ha_config.h.in and linux-ha/config.h.in since it complained
about
  those missing too.
  ConfigureMe configure runs to completion
  ConfigureMe make exits with the following:
 
  In file included from base64.c:18:
  ../../include/heartbeat.h:38:23: error: hb_config.h: No
  such file or directory
  ../../include/heartbeat.h:98:1: error: HB_RC_DIR redefined
  In file included from ../../include/lha_internal.h:37,
  from base64.c:17:
  ../../linux-ha/config.h:504:1: error: this is the location of the
previous
  definition
  In file included from base64.c:18:
  ../../include/heartbeat.h:102:1: error: HALIB
redefined
  command line:1:1: error: this is the location of the previous
definition
  gmake[2]: *** [base64.lo] Error 1
  gmake[2]: Leaving directory
  `/root/HA/Heartbeat-Dev-829e377e00bd/lib/clplumbing'
  gmake[1]: *** [all-recursive] Error 1
  gmake[1]: Leaving directory
  `/root/HA/Heartbeat-Dev-829e377e00bd/lib'
  gmake: *** [all-recursive] Error 1
 
 
  Any ideas?
 
  Doug
 
 
 
  On Sun, 2007-03-25 at 16:06 -0600, Alan Robertson wrote:
  Doug Knight wrote:
   Got it. The attached file contains the strace from the second 
   attempt
by
   heartbeat to start the resource up as master, right 

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-29 Thread Andrew Beekhof

On 3/28/07, Doug Knight [EMAIL PROTECTED] wrote:


 One additional question that has come up as I've developed my notify
function: When the promote of the slave completes, and the post-promote
notify is sent out, is this post-promote notify sent to both the master and
slave nodes, or just to the slave?


both



 Doug


 On Wed, 2007-03-28 at 12:34 -0400, Doug Knight wrote:

 Hi Lars,
 I've gone through your comments below, and I think I understand this a bit
better. Let me state what I think I need to do, and see if I've got it now
(starting with Node A master and Node B slave):

 Node A: PostgreSQL master gets demoted - completes (basically just stops
PostgreSQL)
 Node B: PostgreSQL slave gets promoted - starts it up as new master server
 Node A: PostgreSQL demoted master gets a post notify call with promote
notify operation

 At this point I know heartbeat has brought the new master server up (node
B), and I can safely rsync and restart the new slave server (node A). Nice
and clean... Also, I need to remember that notify will get called for all
combinations, and if I'm only handling the post-promote combination, all
others I need to simply return OCF_SUCCESS, right?

 The remaining question I have is, does adding notify to the actions
meta-data and configuring the notify operation via the Add Operation on the
GUI activate heartbeat's usage of notify? Or are there any other
parameters/flags I need to set to enable using notify? I seem to remember
there was mention of some configuration items for using notify.

 Thanks,
 Doug

 On Tue, 2007-03-27 at 23:14 +0200, Lars Marowsky-Bree wrote:
 On 2007-03-22T13:24:02, Doug Knight [EMAIL PROTECTED] wrote:

Hi,

I've been out a bit myself but now want to answer this.

 Hi Alan,
 I took a look at the drbd OCF script's notify function, and the online
 documentation. I believe there is one circumstance where I need to make
 use of the pre/post notify.

The reason why drbd calls update_prefs (ie, crm_master) in the
post(-start) notification, and not within start itself, is that by that
time, start will have been completed on all (one or both) nodes.

That means that by that time, it's safe to figure out which side is
preferable for becoming master.


 The last step in my development/testing has
 to do with several steps I take to prepare the server that was primary
 and is now becoming standby. First, the primary gets demoted, right?

Yes.

 Then the secondary gets promoted. The problem I have is that part of the
 process of preparing the new standby requires that the new active server
 process is up and accessible. If the demote has to complete before the
 promote can begin, I cannot do the rsync in the demote, because the
 promote hasn't started and placed the new primary in an accessible
 state.

That seems to be true for your scenario, yes.

 So, if I understand the notify function, then I need a post process
 section that looks for the master going active and accessible, so I
 can do the rsync and start up the new standby, right?

That you could do. The instances will get a post-promote notification,
which could do what you want.

 Can you expand a little on the notify processing? The web page just
 lists the variables involved, and the drbd OCF script only makes use
 of a few of them, and I need a more detailed explanation of how and
 when they are used.

Well, you get a pre-notification before start/stop/promote/demote happen
anywhere and a post-notification after they have completed everywhere.
That's basically the gist of it.

Does that make it clearer, or do you have a specific question?


Sincerely,
 Lars


 ___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/



___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-29 Thread Doug Knight
I went ahead and installed via RPM automake, autoconf, and libtool, even
though they were not needed for the 2.0.8 baseline build (I believe
libtool at least may have been installed previously). ConfigureMe
bootstrap ran fine. From the output I gather it includes the equivalent
of the ConfigureMe configure run, and now I just need to do the make?

Doug
On Thu, 2007-03-29 at 09:23 -0400, Doug Knight wrote:

 OK, Tried that, no luck. It still complains about libtool, autoconf,
 and automake. When I copy over the same basic files from the 2.0.8
 directory, bootstrap still does not work, but ConfigureMe configure
 does (at least to the point where it starts looking for the
 Makefile.in files). 
 
 Doug
 
 On Thu, 2007-03-29 at 09:50 +0200, Andrew Beekhof wrote: 
 
  pretty sure you need:
 ./ConfigureMe bootstrap
  
  On 3/28/07, Doug Knight [EMAIL PROTECTED] wrote:
  
Alan and Lars,
After much effort, I have had no success in building from the mercurial
   version. Here's what I tried, and since this is the first time I've tried 
   to
   build a dev version, maybe you can see where I'm going wrong:
  
Downloaded tar from hg.linux-ha.org/dev
Unpacked it in a subdirectory to my root account, HA
Attempted a quick ConfigureMe configure, got errors that it couldn't find
   libtool, automake, autoconf
Did a side-by-side comparison to the HA 2.0.8 I built successfully and am
   running with, found no libltdl.tar or libltdl directory under the dev
   version. Copied these from my original stable release tar into the 
   directory
   structure for the dev version.
Ran ConfigureMe configure, which then complained about all the 
   Makefile.in
   files missing. Copied those over as well from the side-by-side. Also 
   pulled
   include/ha_config.h.in and linux-ha/config.h.in since it complained about
   those missing too.
ConfigureMe configure runs to completion
ConfigureMe make exits with the following:
  
In file included from base64.c:18:
../../include/heartbeat.h:38:23: error: hb_config.h: No
   such file or directory
../../include/heartbeat.h:98:1: error: HB_RC_DIR redefined
In file included from ../../include/lha_internal.h:37,
 from base64.c:17:
../../linux-ha/config.h:504:1: error: this is the location of the 
   previous
   definition
In file included from base64.c:18:
../../include/heartbeat.h:102:1: error: HALIB redefined
command line:1:1: error: this is the location of the previous 
   definition
gmake[2]: *** [base64.lo] Error 1
gmake[2]: Leaving directory
   `/root/HA/Heartbeat-Dev-829e377e00bd/lib/clplumbing'
gmake[1]: *** [all-recursive] Error 1
gmake[1]: Leaving directory
   `/root/HA/Heartbeat-Dev-829e377e00bd/lib'
gmake: *** [all-recursive] Error 1
  
  
Any ideas?
  
Doug
  
  
  
On Sun, 2007-03-25 at 16:06 -0600, Alan Robertson wrote:
Doug Knight wrote:
Got it. The attached file contains the strace from the second attempt by
heartbeat to start the resource up as master, right up until it was
killed. The resource already showed failed on the gui. I zipped it up
using gzip.
  
   Lars asked a good question as well...
  
   Could you kindly reproduce this with the current Mercurial tip version?
  
Thanks1
  
  
  
   ___
   Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
   Home Page: http://linux-ha.org/
  
  
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
  
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-29 Thread Doug Knight
OK, ConfigureMe make is complete. As when I built the baseline 2.0.8, I
had two compile errors, and had to make some minor source code mods (see
attached error message and file diffs). Not sure if this is related or
not. I've done a make install in the crm/admin directory to replace the
crm_master used by the system. I'll test that out in a bit. 

Doug

On Thu, 2007-03-29 at 15:38 +0200, Andrew Beekhof wrote:

 On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote:
 
   I went ahead and installed via RPM automake, autoconf, and libtool, even
  though they were not needed for the 2.0.8 baseline build (I believe libtool
  at least may have been installed previously). ConfigureMe bootstrap ran
  fine. From the output I gather it includes the equivalent of the ConfigureMe
  configure run, and now I just need to do the make?
 
 right
 
 
   Doug
 
   On Thu, 2007-03-29 at 09:23 -0400, Doug Knight wrote:
 
   OK, Tried that, no luck. It still complains about libtool, autoconf, and
  automake. When I copy over the same basic files from the 2.0.8 directory,
  bootstrap still does not work, but ConfigureMe configure does (at least to
  the point where it starts looking for the Makefile.in files).
 
   Doug
 
   On Thu, 2007-03-29 at 09:50 +0200, Andrew Beekhof wrote:
   pretty sure you need:
   ./ConfigureMe bootstrap
 
  On 3/28/07, Doug Knight [EMAIL PROTECTED] wrote:
  
   Alan and Lars,
   After much effort, I have had no success in building from the mercurial
   version. Here's what I tried, and since this is the first time I've tried
  to
   build a dev version, maybe you can see where I'm going wrong:
  
   Downloaded tar from hg.linux-ha.org/dev
   Unpacked it in a subdirectory to my root account, HA
   Attempted a quick ConfigureMe configure, got errors that it couldn't find
   libtool, automake, autoconf
   Did a side-by-side comparison to the HA 2.0.8 I built successfully and am
   running with, found no libltdl.tar or libltdl directory under the dev
   version. Copied these from my original stable release tar into the
  directory
   structure for the dev version.
   Ran ConfigureMe configure, which then complained about all the Makefile.in
   files missing. Copied those over as well from the side-by-side. Also
  pulled
   include/ha_config.h.in and linux-ha/config.h.in since it complained about
   those missing too.
   ConfigureMe configure runs to completion
   ConfigureMe make exits with the following:
  
   In file included from base64.c:18:
   ../../include/heartbeat.h:38:23: error: hb_config.h: No
   such file or directory
   ../../include/heartbeat.h:98:1: error: HB_RC_DIR redefined
   In file included from ../../include/lha_internal.h:37,
   from base64.c:17:
   ../../linux-ha/config.h:504:1: error: this is the location of the previous
   definition
   In file included from base64.c:18:
   ../../include/heartbeat.h:102:1: error: HALIB redefined
   command line:1:1: error: this is the location of the previous definition
   gmake[2]: *** [base64.lo] Error 1
   gmake[2]: Leaving directory
   `/root/HA/Heartbeat-Dev-829e377e00bd/lib/clplumbing'
   gmake[1]: *** [all-recursive] Error 1
   gmake[1]: Leaving directory
   `/root/HA/Heartbeat-Dev-829e377e00bd/lib'
   gmake: *** [all-recursive] Error 1
  
  
   Any ideas?
  
   Doug
  
  
  
   On Sun, 2007-03-25 at 16:06 -0600, Alan Robertson wrote:
   Doug Knight wrote:
Got it. The attached file contains the strace from the second attempt by
heartbeat to start the resource up as master, right up until it was
killed. The resource already showed failed on the gui. I zipped it up
using gzip.
  
   Lars asked a good question as well...
  
   Could you kindly reproduce this with the current Mercurial tip version?
  
   Thanks1
  
  
  
   ___
   Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
   Home Page: http://linux-ha.org/
  
  
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 
 
   ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 
 
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
 
send_arp.c:


Making all in libnet_util
gmake[2]: Entering directory 
`/root/HA/Heartbeat-Dev-829e377e00bd/heartbeat/libnet_util'
if gcc -DHAVE_CONFIG_H -I. -I. -I../../include -I../../include -I../../include 
-I../../include 

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-29 Thread Alan Robertson
Lars Marowsky-Bree wrote:
 On 2007-03-27T22:38:50, Alan Robertson [EMAIL PROTECTED] wrote:
 
 mandatory in the sense that nothing will get promoted until someone,
 somewhere runs it.
 but the exact timing is completely up to the user/admin/RA... it is even
 possible to run it manually if you have to
 I originally assumed what you said, but the docs contradict that by
 calling it mandatory (and not qualifying the term).
 
 Well, it's mandatory in the sense that without calling it, you don't get
 m/s, just a clone ;-)


THe docs are more specific:
They state that it is mandatory to call it in the start action.

That doesn't seem to be true.

You could call it in a monitor action quite nicely, or so it seems to me...


-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-28 Thread Doug Knight
I'm working through the various emails I've received on what to try for
the issue I'm having with crm_master. Here's my first response. I pulled
down the bz2 version of the Mercurial version per the URL given in a
previous email (hg.linux-ha.org/dev), and attempted to build on my
server. Using the ./ConfigureMe configure command I get errors that I
don't have autoconf, automake, and libtool, and some recursive symlinks
appear for these in the same directory that I'm running ConfigureMe in.
This is the same system I've built the 2.0.8 version on successfully. Do
the dev versions require additional packages to be installed?

Doug

On Sun, 2007-03-25 at 16:06 -0600, Alan Robertson wrote:
 Doug Knight wrote:
  Got it. The attached file contains the strace from the second attempt by
  heartbeat to start the resource up as master, right up until it was
  killed. The resource already showed failed on the gui. I zipped it up
  using gzip.
 
 Lars asked a good question as well...
 
 Could you kindly reproduce this with the current Mercurial tip version?
 
   Thanks1
 
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-28 Thread Doug Knight
On Wed, 2007-03-28 at 09:14 +0200, Andrew Beekhof wrote:
 On 3/28/07, Alan Robertson [EMAIL PROTECTED] wrote:
  Andrew Beekhof wrote:
  
   On Mar 22, 2007, at 2:13 AM, Alan Robertson wrote:
  
   Doug Knight wrote:
   Hi Andrew,
   I had just started reviewing both of thes scripts, and reviewed the
   Multistate and clone resource pages on the web site. It looks like
   multistate is how I need to handle it, but a couple of questions first.
  
   1. I noticed that the write-up says the resource must come up on each of
   the servers in shadow mode first, then one gets promoted. Does this
   imply a start on both servers, and the OCF start function determining
   which server is active vs shadow (I'm picturing a check in the OCF
   script to determine postgresql standby mode = shadow/crm_master value
   low, and postgresql active mode = active/crm_master value high), then a
   promote to the active server?
  
   2. I noticed that the drbd OCF script contains a notify function,
   where the Stateful OCF script does not. The notify function looks to be
   where the important actions are taken (calling drbd_start_phase_2,
   pre/post, etc). Is the notify function necessary, or is it sufficient in
   my case to handle it through the start|stop|promote|demote functions?
  
   Thanks for your help,
   Doug
  
   Andrew's out for a while.
  
   The start function starts you up in slave/secondary mode.  All resources
   initially start up in slave mode.
  
   A set of servers is chosen to run the resources on (it might be one,
   two, the whole set, etc. depending on clone_max and clone_node_max and
   the usual constraints).
  
   They are started on the selected nodes using start
  
   During the start operation, you are given the chance to declare yourself
   ready to become master or not by using the crm_master command line tool.
  
   I believe that your resource can run that command any time they like -
   for example at a monitor operation...  But, it is mandatory that they
   run it when they first start up.
  
   mandatory in the sense that nothing will get promoted until someone,
   somewhere runs it.
   but the exact timing is completely up to the user/admin/RA... it is even
   possible to run it manually if you have to
 
  I originally assumed what you said, but the docs contradict that by
  calling it mandatory (and not qualifying the term).  And the code seems
  to indicate that you can ONLY run it from an RA.
 
 if you know which OCF environment variables to set, then you can
 potentially run it from anywhere... but most people wont need to run
 it outside of the RA

What I did was to set up the defaults within the OCF script to point to
the locations and values I needed for this specific instance I'm
testing. That way I can manually execute the script pretty well. That's
when I ran into the crm_master spinning. I did need to manually set the
OCF_RESOURCE_INSTANCE so that during the start function I could attempt
to find if the resource already was running in the cluster somewhere.
Definitely made it possible to test all the associated functions like
stop, usage. methods, meta-data, monitor, etc.


 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
 
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-28 Thread Doug Knight
Hi Lars,
I've gone through your comments below, and I think I understand this a
bit better. Let me state what I think I need to do, and see if I've got
it now (starting with Node A master and Node B slave):

Node A: PostgreSQL master gets demoted - completes (basically just stops
PostgreSQL)
Node B: PostgreSQL slave gets promoted - starts it up as new master
server
Node A: PostgreSQL demoted master gets a post notify call with promote
notify operation

At this point I know heartbeat has brought the new master server up
(node B), and I can safely rsync and restart the new slave server (node
A). Nice and clean... Also, I need to remember that notify will get
called for all combinations, and if I'm only handling the post-promote
combination, all others I need to simply return OCF_SUCCESS, right?

The remaining question I have is, does adding notify to the actions
meta-data and configuring the notify operation via the Add Operation on
the GUI activate heartbeat's usage of notify? Or are there any other
parameters/flags I need to set to enable using notify? I seem to
remember there was mention of some configuration items for using notify.

Thanks,
Doug

On Tue, 2007-03-27 at 23:14 +0200, Lars Marowsky-Bree wrote:

 On 2007-03-22T13:24:02, Doug Knight [EMAIL PROTECTED] wrote:
 
 Hi,
 
 I've been out a bit myself but now want to answer this.
 
  Hi Alan,
  I took a look at the drbd OCF script's notify function, and the online
  documentation. I believe there is one circumstance where I need to make
  use of the pre/post notify. 
 
 The reason why drbd calls update_prefs (ie, crm_master) in the
 post(-start) notification, and not within start itself, is that by that
 time, start will have been completed on all (one or both) nodes.
 
 That means that by that time, it's safe to figure out which side is
 preferable for becoming master.
 
 
  The last step in my development/testing has
  to do with several steps I take to prepare the server that was primary
  and is now becoming standby. First, the primary gets demoted, right?
 
 Yes.
 
  Then the secondary gets promoted. The problem I have is that part of the
  process of preparing the new standby requires that the new active server
  process is up and accessible. If the demote has to complete before the
  promote can begin, I cannot do the rsync in the demote, because the
  promote hasn't started and placed the new primary in an accessible
  state. 
 
 That seems to be true for your scenario, yes.
 
  So, if I understand the notify function, then I need a post process
  section that looks for the master going active and accessible, so I
  can do the rsync and start up the new standby, right?
 
 That you could do. The instances will get a post-promote notification,
 which could do what you want.
 
  Can you expand a little on the notify processing? The web page just
  lists the variables involved, and the drbd OCF script only makes use
  of a few of them, and I need a more detailed explanation of how and
  when they are used. 
 
 Well, you get a pre-notification before start/stop/promote/demote happen
 anywhere and a post-notification after they have completed everywhere.
 That's basically the gist of it.
 
 Does that make it clearer, or do you have a specific question?
 
 
 Sincerely,
 Lars
 
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-28 Thread Doug Knight
One additional question that has come up as I've developed my notify
function: When the promote of the slave completes, and the post-promote
notify is sent out, is this post-promote notify sent to both the master
and slave nodes, or just to the slave?

Doug

On Wed, 2007-03-28 at 12:34 -0400, Doug Knight wrote:

 Hi Lars,
 I've gone through your comments below, and I think I understand this a
 bit better. Let me state what I think I need to do, and see if I've
 got it now (starting with Node A master and Node B slave):
 
 Node A: PostgreSQL master gets demoted - completes (basically just
 stops PostgreSQL)
 Node B: PostgreSQL slave gets promoted - starts it up as new master
 server
 Node A: PostgreSQL demoted master gets a post notify call with promote
 notify operation
 
 At this point I know heartbeat has brought the new master server up
 (node B), and I can safely rsync and restart the new slave server
 (node A). Nice and clean... Also, I need to remember that notify will
 get called for all combinations, and if I'm only handling the
 post-promote combination, all others I need to simply return
 OCF_SUCCESS, right?
 
 The remaining question I have is, does adding notify to the actions
 meta-data and configuring the notify operation via the Add Operation
 on the GUI activate heartbeat's usage of notify? Or are there any
 other parameters/flags I need to set to enable using notify? I seem to
 remember there was mention of some configuration items for using
 notify.
 
 Thanks,
 Doug
 
 On Tue, 2007-03-27 at 23:14 +0200, Lars Marowsky-Bree wrote: 
 
  On 2007-03-22T13:24:02, Doug Knight [EMAIL PROTECTED] wrote:
  
  Hi,
  
  I've been out a bit myself but now want to answer this.
  
   Hi Alan,
   I took a look at the drbd OCF script's notify function, and the online
   documentation. I believe there is one circumstance where I need to make
   use of the pre/post notify. 
  
  The reason why drbd calls update_prefs (ie, crm_master) in the
  post(-start) notification, and not within start itself, is that by that
  time, start will have been completed on all (one or both) nodes.
  
  That means that by that time, it's safe to figure out which side is
  preferable for becoming master.
  
  
   The last step in my development/testing has
   to do with several steps I take to prepare the server that was primary
   and is now becoming standby. First, the primary gets demoted, right?
  
  Yes.
  
   Then the secondary gets promoted. The problem I have is that part of the
   process of preparing the new standby requires that the new active server
   process is up and accessible. If the demote has to complete before the
   promote can begin, I cannot do the rsync in the demote, because the
   promote hasn't started and placed the new primary in an accessible
   state. 
  
  That seems to be true for your scenario, yes.
  
   So, if I understand the notify function, then I need a post process
   section that looks for the master going active and accessible, so I
   can do the rsync and start up the new standby, right?
  
  That you could do. The instances will get a post-promote notification,
  which could do what you want.
  
   Can you expand a little on the notify processing? The web page just
   lists the variables involved, and the drbd OCF script only makes use
   of a few of them, and I need a more detailed explanation of how and
   when they are used. 
  
  Well, you get a pre-notification before start/stop/promote/demote happen
  anywhere and a post-notification after they have completed everywhere.
  That's basically the gist of it.
  
  Does that make it clearer, or do you have a specific question?
  
  
  Sincerely,
  Lars
  
 
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-27 Thread Alan Robertson
Andrew Beekhof wrote:
 
 On Mar 22, 2007, at 2:13 AM, Alan Robertson wrote:
 
 Doug Knight wrote:
 Hi Andrew,
 I had just started reviewing both of thes scripts, and reviewed the
 Multistate and clone resource pages on the web site. It looks like
 multistate is how I need to handle it, but a couple of questions first.

 1. I noticed that the write-up says the resource must come up on each of
 the servers in shadow mode first, then one gets promoted. Does this
 imply a start on both servers, and the OCF start function determining
 which server is active vs shadow (I'm picturing a check in the OCF
 script to determine postgresql standby mode = shadow/crm_master value
 low, and postgresql active mode = active/crm_master value high), then a
 promote to the active server?

 2. I noticed that the drbd OCF script contains a notify function,
 where the Stateful OCF script does not. The notify function looks to be
 where the important actions are taken (calling drbd_start_phase_2,
 pre/post, etc). Is the notify function necessary, or is it sufficient in
 my case to handle it through the start|stop|promote|demote functions?

 Thanks for your help,
 Doug

 Andrew's out for a while.

 The start function starts you up in slave/secondary mode.  All resources
 initially start up in slave mode.

 A set of servers is chosen to run the resources on (it might be one,
 two, the whole set, etc. depending on clone_max and clone_node_max and
 the usual constraints).

 They are started on the selected nodes using start

 During the start operation, you are given the chance to declare yourself
 ready to become master or not by using the crm_master command line tool.

 I believe that your resource can run that command any time they like -
 for example at a monitor operation...  But, it is mandatory that they
 run it when they first start up.
 
 mandatory in the sense that nothing will get promoted until someone,
 somewhere runs it.
 but the exact timing is completely up to the user/admin/RA... it is even
 possible to run it manually if you have to

I originally assumed what you said, but the docs contradict that by
calling it mandatory (and not qualifying the term).  And the code seems
to indicate that you can ONLY run it from an RA.

-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-25 Thread Alan Robertson
Doug Knight wrote:
 Got it. The attached file contains the strace from the second attempt by
 heartbeat to start the resource up as master, right up until it was
 killed. The resource already showed failed on the gui. I zipped it up
 using gzip.

Lars asked a good question as well...

Could you kindly reproduce this with the current Mercurial tip version?

Thanks1

-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-25 Thread Alan Robertson
Doug Knight wrote:
 Got it. The attached file contains the strace from the second attempt by
 heartbeat to start the resource up as master, right up until it was
 killed. The resource already showed failed on the gui. I zipped it up
 using gzip.

By the way, from the system call perspective, what it's doing is
mallocing again and again and again...

I presume it's in this function (from the top level)
   rc = update_attr(the_cib, cib_opts, type, dest_node, set_name,
   attr_id, attr_name, attr_value);


And I further presume (with somewhat more risk) that it's in this
function from the next level down:

rc = the_cib-cmds-modify(the_cib, section, xml_top, NULL,
   call_options|cib_quorum_override);

cib_client_modify(CIB_OP_MODIFY...)

cib_native_perform_op()

Which sends the request over to the CIB, where it should do this...

cib_process_modify()

update_xml_child(obj_root, input)

However, from cib_process_modify on, all the work takes place in the
CIB, not in the crm_master command.  So, I presume that it doesn't get
that far.  [Other theories are also possible, of course ;-)]

Here is my initial conclusion:
1)  No one else has reported this problem
2)  The code in question is common and is used for many things
3)  Therefore it's more likely that something is amiss with your
CIB and causing the CIB code to loop looking for the
subtree to modify.  If this theory is correct, there are
two problems one with your CIB, and one in the code.

So, could you please send the current output from cibadmin -Q to the
list as an attachment?

Could you also please run crm_verify on your CIB and see if it complains
about anything.  If it does, please fix its complaints, and try again.

And, could you also please tell us how you installed the system.  If you
didn't install a package, then did you make the required user ID and
group ID?


Thanks!


-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-24 Thread Lars Marowsky-Bree
On 2007-03-23T12:56:12, Doug Knight [EMAIL PROTECTED] wrote:

 I figured this one out, please ignore, its because I didn't give it a
 value. If I run crm_master -v 100 at the command line, it spins right up
 to 100% cpu with no error.

Can you retest that with the hg.linux-ha.org/dev version please?

We've made significant improvements, and as Andrew is currently quite
occupied, I'd hate to look for a fixed bug ;-)

If you check out the drbd RA, you'll find that this one actually uses
the crm_master command and does not have this problem; so it must be
either something particular to your setup or something already fixed
...


Sincerely,
Lars

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-23 Thread Doug Knight
Hi Alan,
I've started testing my OCF script, and I'm seeing something unusual
during initial startup. I've placed a crm_master call in my
stateful_start function, after the function has determined that it is
running on what should be the master, and postgresql has successfully
started:

crm_master -v 100

When this command gets executed, it starts using nearly 100% CPU, memory
usage continuously increases up to about 68%, then it dies (killed via
timeout?), followed by a second attempt to go master (with the same
charactistics, after the function timeout is exceeded), then a demote is
sent (again, after timeout) and it switches to try to become the slave
(crm_master -v 10 is what I use, though I'm not sure this is correct
usage to say I want to change to a slave). Eventually, I wind up with
the resource in failed mode.

First question, any idea why the straight line running of a crm_master
-v 100 (not within any loops in my script) would spin up to 100%?

Second question, is using the crm_master -v with different values the
way to say on which node I prefer the master to run (higher number =
preferred node)?

Doug

On Thu, 2007-03-22 at 07:46 -0400, Doug Knight wrote:
 Thank you Alan, that explanation really helped. Would it be useful for
 me to post my OCF script once its done and tested?
 
 Doug
 
 On Wed, 2007-03-21 at 19:13 -0600, Alan Robertson wrote: 
  Doug Knight wrote:
   Hi Andrew,
   I had just started reviewing both of thes scripts, and reviewed the
   Multistate and clone resource pages on the web site. It looks like
   multistate is how I need to handle it, but a couple of questions first.
   
   1. I noticed that the write-up says the resource must come up on each of
   the servers in shadow mode first, then one gets promoted. Does this
   imply a start on both servers, and the OCF start function determining
   which server is active vs shadow (I'm picturing a check in the OCF
   script to determine postgresql standby mode = shadow/crm_master value
   low, and postgresql active mode = active/crm_master value high), then a
   promote to the active server?
   
   2. I noticed that the drbd OCF script contains a notify function,
   where the Stateful OCF script does not. The notify function looks to be
   where the important actions are taken (calling drbd_start_phase_2,
   pre/post, etc). Is the notify function necessary, or is it sufficient in
   my case to handle it through the start|stop|promote|demote functions?
   
   Thanks for your help,
   Doug
  
  Andrew's out for a while.
  
  The start function starts you up in slave/secondary mode.  All resources
  initially start up in slave mode.
  
  A set of servers is chosen to run the resources on (it might be one,
  two, the whole set, etc. depending on clone_max and clone_node_max and
  the usual constraints).
  
  They are started on the selected nodes using start
  
  During the start operation, you are given the chance to declare yourself
  ready to become master or not by using the crm_master command line tool.
  
  I believe that your resource can run that command any time they like -
  for example at a monitor operation...  But, it is mandatory that they
  run it when they first start up.
  
  After this, heartbeat will try and promote as many of these resources as
  is consistent with its configured properties, and the crm_master
  commands that were run.
  
  The notify command tells you when your peers come and go.  Do you need
  to take actions if you know this?
  
  If so, then you need to implement the notify actions...
  
  
  
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-23 Thread Alan Robertson
Doug Knight wrote:
 Hi Alan,
 I've started testing my OCF script, and I'm seeing something unusual
 during initial startup. I've placed a crm_master call in my
 stateful_start function, after the function has determined that it is
 running on what should be the master, and postgresql has successfully
 started:
 
 crm_master -v 100
 
 When this command gets executed, it starts using nearly 100% CPU, memory
 usage continuously increases up to about 68%, then it dies (killed via
 timeout?), followed by a second attempt to go master (with the same
 charactistics, after the function timeout is exceeded), then a demote is
 sent (again, after timeout) and it switches to try to become the slave
 (crm_master -v 10 is what I use, though I'm not sure this is correct
 usage to say I want to change to a slave). Eventually, I wind up with
 the resource in failed mode.
 
 First question, any idea why the straight line running of a crm_master
 -v 100 (not within any loops in my script) would spin up to 100%?

Bugs maybe?  What version of heartbeat are you running?  Which processes
are running up to 100%?  For how long?

 Second question, is using the crm_master -v with different values the
 way to say on which node I prefer the master to run (higher number =
 preferred node)?

Yes.  I believe that these are added into the values that come from
other constraints in your configuration file to come up with a best
configuration.

--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-23 Thread Alan Robertson
Doug Knight wrote:
 Current 2.0.8 tarball from 1/18/07. Process in top looks like:
 
   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEM  TIME+   COMMAND
 24591 root  18   0 1663m 1.5g 1028 R   83 77.8  1:19.42
 /usr/sbin/crm_master -v 100
 
 It dies and restarts about every 120 seconds, which happens to be the
 timeout I have specified for the stop and start methods.
 
 Doug
 
 On Fri, 2007-03-23 at 08:20 -0600, Alan Robertson wrote:
 Doug Knight wrote:
  Hi Alan,
  I've started testing my OCF script, and I'm seeing something unusual
  during initial startup. I've placed a crm_master call in my
  stateful_start function, after the function has determined that it is
  running on what should be the master, and postgresql has successfully
  started:
  
  crm_master -v 100
  
  When this command gets executed, it starts using nearly 100% CPU, memory
  usage continuously increases up to about 68%, then it dies (killed via
  timeout?), followed by a second attempt to go master (with the same
  charactistics, after the function timeout is exceeded), then a demote is
  sent (again, after timeout) and it switches to try to become the slave
  (crm_master -v 10 is what I use, though I'm not sure this is correct
  usage to say I want to change to a slave). Eventually, I wind up with
  the resource in failed mode.
  
  First question, any idea why the straight line running of a crm_master
  -v 100 (not within any loops in my script) would spin up to 100%?

 Bugs maybe?  What version of heartbeat are you running?  Which processes
 are running up to 100%?  For how long?

  Second question, is using the crm_master -v with different values the
  way to say on which node I prefer the master to run (higher number =
  preferred node)?

 Yes.  I believe that these are added into the values that come from
 other constraints in your configuration file to come up with a best
 configuration.

Good info.

Could you provide a few hundred lines of strace output to show us what
it's doing?

Thanks!


-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-23 Thread Doug Knight
On Fri, 2007-03-23 at 09:25 -0600, Alan Robertson wrote:
 Doug Knight wrote:
  Current 2.0.8 tarball from 1/18/07. Process in top looks like:
  
PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEM  TIME+   COMMAND
  24591 root  18   0 1663m 1.5g 1028 R   83 77.8  1:19.42
  /usr/sbin/crm_master -v 100
  
  It dies and restarts about every 120 seconds, which happens to be the
  timeout I have specified for the stop and start methods.
  
  Doug
  
  On Fri, 2007-03-23 at 08:20 -0600, Alan Robertson wrote:
  Doug Knight wrote:
   Hi Alan,
   I've started testing my OCF script, and I'm seeing something unusual
   during initial startup. I've placed a crm_master call in my
   stateful_start function, after the function has determined that it is
   running on what should be the master, and postgresql has successfully
   started:
   
   crm_master -v 100
   
   When this command gets executed, it starts using nearly 100% CPU, memory
   usage continuously increases up to about 68%, then it dies (killed via
   timeout?), followed by a second attempt to go master (with the same
   charactistics, after the function timeout is exceeded), then a demote is
   sent (again, after timeout) and it switches to try to become the slave
   (crm_master -v 10 is what I use, though I'm not sure this is correct
   usage to say I want to change to a slave). Eventually, I wind up with
   the resource in failed mode.
   
   First question, any idea why the straight line running of a crm_master
   -v 100 (not within any loops in my script) would spin up to 100%?
 
  Bugs maybe?  What version of heartbeat are you running?  Which processes
  are running up to 100%?  For how long?
 
   Second question, is using the crm_master -v with different values the
   way to say on which node I prefer the master to run (higher number =
   preferred node)?
 
  Yes.  I believe that these are added into the values that come from
  other constraints in your configuration file to come up with a best
  configuration.
 
 Good info.
 
 Could you provide a few hundred lines of strace output to show us what
 it's doing?
 

Do you mean the last few hundred lines from ha.log? Just the primary
where I'm trying to start?

   Thanks!
 
 
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-23 Thread Doug Knight
I figured this one out, please ignore, its because I didn't give it a
value. If I run crm_master -v 100 at the command line, it spins right up
to 100% cpu with no error.

Doug

On Fri, 2007-03-23 at 12:52 -0400, Doug Knight wrote:
 This might help. With the resource in a failed mode, but target_role =
 started, I manually ran crm_master, exporting the proper resource ID,
 with the following results:
 
 [EMAIL PROTECTED] wsi]# vi stateful_pgsql 
 [EMAIL PROTECTED] wsi]# OCF_RESOURCE_INSTANCE=pgsql_wal_5556:0
 [EMAIL PROTECTED] wsi]# export OCF_RESOURCE_INSTANCE
 [EMAIL PROTECTED] wsi]# crm_master -V
 crm_master[7588]: 2007/03/23_12:50:45 ERROR: crm_abort: main:
 Triggered non-fatal assert at crm_attribute.c:353 : attr_value != NULL
 
 
 Doug
 
 On Fri, 2007-03-23 at 12:21 -0400, Doug Knight wrote:
  Got it. The attached file contains the strace from the second
  attempt by heartbeat to start the resource up as master, right up
  until it was killed. The resource already showed failed on the gui.
  I zipped it up using gzip.
  
  Doug
  
  On Fri, 2007-03-23 at 10:11 -0600, Alan Robertson wrote:  
   Doug Knight wrote:
On Fri, 2007-03-23 at 09:25 -0600, Alan Robertson wrote:
Doug Knight wrote:
 Current 2.0.8 tarball from 1/18/07. Process in top looks like:
 
   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEM  TIME+   COMMAND
 24591 root  18   0 1663m 1.5g 1028 R   83 77.8  1:19.42
 /usr/sbin/crm_master -v 100
 
 It dies and restarts about every 120 seconds, which happens to be the
 timeout I have specified for the stop and start methods.
 
 Doug
 
 On Fri, 2007-03-23 at 08:20 -0600, Alan Robertson wrote:
 Doug Knight wrote:
  Hi Alan,
  I've started testing my OCF script, and I'm seeing something 
  unusual
  during initial startup. I've placed a crm_master call in my
  stateful_start function, after the function has determined that 
  it is
  running on what should be the master, and postgresql has 
  successfully
  started:
  
  crm_master -v 100
  
  When this command gets executed, it starts using nearly 100% CPU, 
  memory
  usage continuously increases up to about 68%, then it dies 
  (killed via
  timeout?), followed by a second attempt to go master (with the 
  same
  charactistics, after the function timeout is exceeded), then a 
  demote is
  sent (again, after timeout) and it switches to try to become the 
  slave
  (crm_master -v 10 is what I use, though I'm not sure this is 
  correct
  usage to say I want to change to a slave). Eventually, I wind up 
  with
  the resource in failed mode.
  
  First question, any idea why the straight line running of a 
  crm_master
  -v 100 (not within any loops in my script) would spin up to 100%?

 Bugs maybe?  What version of heartbeat are you running?  Which 
 processes
 are running up to 100%?  For how long?

  Second question, is using the crm_master -v with different values 
  the
  way to say on which node I prefer the master to run (higher 
  number =
  preferred node)?

 Yes.  I believe that these are added into the values that come from
 other constraints in your configuration file to come up with a best
 configuration.
   
Good info.
   
Could you provide a few hundred lines of strace output to show us what
it's doing?
   

Do you mean the last few hundred lines from ha.log? Just the primary
where I'm trying to start?
   
   No, I mean output from the strace command.  From your reply, I'd guess
   you've never used it:
   
 strace -tt -p process-id-of-hung-process  /some/file
   
   Do that for a few seconds, and attach the file to an email to the list.
   
   Does that help?
   
   
   
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-22 Thread Doug Knight
Hi Alan,
I took a look at the drbd OCF script's notify function, and the online
documentation. I believe there is one circumstance where I need to make
use of the pre/post notify. The last step in my development/testing has
to do with several steps I take to prepare the server that was primary
and is now becoming standby. First, the primary gets demoted, right?
Then the secondary gets promoted. The problem I have is that part of the
process of preparing the new standby requires that the new active server
process is up and accessible. If the demote has to complete before the
promote can begin, I cannot do the rsync in the demote, because the
promote hasn't started and placed the new primary in an accessible
state. 

So, if I understand the notify function, then I need a post process
section that looks for the master going active and accessible, so I
can do the rsync and start up the new standby, right? Can you expand a
little on the notify processing? The web page just lists the variables
involved, and the drbd OCF script only makes use of a few of them, and I
need a more detailed explanation of how and when they are used. 

Thanks,
Doug
 
On Wed, 2007-03-21 at 19:13 -0600, Alan Robertson wrote:
 Doug Knight wrote:
  Hi Andrew,
  I had just started reviewing both of thes scripts, and reviewed the
  Multistate and clone resource pages on the web site. It looks like
  multistate is how I need to handle it, but a couple of questions first.
  
  1. I noticed that the write-up says the resource must come up on each of
  the servers in shadow mode first, then one gets promoted. Does this
  imply a start on both servers, and the OCF start function determining
  which server is active vs shadow (I'm picturing a check in the OCF
  script to determine postgresql standby mode = shadow/crm_master value
  low, and postgresql active mode = active/crm_master value high), then a
  promote to the active server?
  
  2. I noticed that the drbd OCF script contains a notify function,
  where the Stateful OCF script does not. The notify function looks to be
  where the important actions are taken (calling drbd_start_phase_2,
  pre/post, etc). Is the notify function necessary, or is it sufficient in
  my case to handle it through the start|stop|promote|demote functions?
  
  Thanks for your help,
  Doug
 
 Andrew's out for a while.
 
 The start function starts you up in slave/secondary mode.  All resources
 initially start up in slave mode.
 
 A set of servers is chosen to run the resources on (it might be one,
 two, the whole set, etc. depending on clone_max and clone_node_max and
 the usual constraints).
 
 They are started on the selected nodes using start
 
 During the start operation, you are given the chance to declare yourself
 ready to become master or not by using the crm_master command line tool.
 
 I believe that your resource can run that command any time they like -
 for example at a monitor operation...  But, it is mandatory that they
 run it when they first start up.
 
 After this, heartbeat will try and promote as many of these resources as
 is consistent with its configured properties, and the crm_master
 commands that were run.
 
 The notify command tells you when your peers come and go.  Do you need
 to take actions if you know this?
 
 If so, then you need to implement the notify actions...
 
 
 
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-21 Thread Andrew Beekhof

On 3/20/07, Doug Knight [EMAIL PROTECTED] wrote:


 Hi Alan,
 I've had some time to try to implement the OCF script for the PostgreSQL
WAL file forwarding configuration, and I still am having some issues. The
main issue is how do I set up an OCF script that allows the process to run
on both servers, one process in a starting mode ingesting transferred WAL
files, the other in an accessible mode allowing database access and
forwarding WAL files to the standby (right now the different states are
determined within the OCF script by looking for the existence of a
recovery.conf file, which indicates the starting, standby side)? What I'm
trying to figure out is how to tell HA that it needs to start the standby
process and monitor it.

 I've started looking at the multi-state configurations, but the
documentation isn't completely clear. Can you expand on it? I'd like
something similar to the answers you gave for the monitor, stop, and start
functions (when do they/can they get invoked, on which node, etc). One
specific area of the multi-state config I have a question about is how to
tell HA which server has the master role when first starting up the
resource. Any additional info would be great.


have a look at the drbd and Stateful agents:
  http://hg.beekhof.net/lha/crm-dev/file/be220f2e9b40/resources/OCF/Stateful.in

but first read:
  http://linux-ha.org/v2/Concepts/MultiState

in particular, section 6.1



 Thanks,
 Doug




 On Wed, 2007-03-14 at 19:59 -0600, Alan Robertson wrote:
 Doug Knight wrote:
 Yes, Thanks Alan. Let me digest it, and walk through my OCF script. I'll
 see if I have any other questions after that.

 Thanks for getting back to me.

 Doug

 On Wed, 2007-03-14 at 11:41 -0600, Alan Robertson wrote:
 Doug Knight wrote:
  Hi All,
  I currently am running a two node cluster (host1 and host2) with
version
  2.0.8. I have a resource defined with a place constraint of #uname eq
  host1, so that it will start on host1 (using an OCF RA script,
  including all of the required methods). The resource itself has
  target_role set to stopped.
 
  Question 1: Is the monitor method called regularly on both nodes to
make
  sure the resource is not running?

 No. It is called on every node when we first start up (we call it a
 probe operation). If you ask us to, we will run it periodically to
 ensure that a running copy continues to run.

 You can also manually request to run this initial probe again to catch
 errors made by system administrators (but I don't know of anyone who
 does that).


  Next, I change the target_role to started (i.e. I use the GUI and
  click the start button).
 [Better yet, use the outline start button]

  Question 2: What is the order of OCF methods called to bring up the
  resource? Is Monitor called before Start on host1? Does Stop and/or
  Monitor ever get called on host2?

 Monitor gets called when we first start up on every node.
 It also get called repeatedly on any node that we think is running it --
 if you ask us to monitor the resource.

  Resource is up and running on host1, and I decide to move the resource
  to host2. I click the constraint and change it to #uname eq host2.
The
  resource stops on host1 and starts on host2.
 
  Question 3: Same idea, what are the sequence of method calls to migrate
  the resource from host1 to host2?

 In the past...
 monitor every resource on every node once to see what's already
 running
 start on some node
 monitor periodically on some node (if requested)

 Request to move resource arrives...
 stop monitoring on some node (if it had been requested)
 stop resource on some node
 start resource on some other node
 monitor periodically on some other node (if requested)

  I'm trying to thoroughly understand the sequence of events that occurs
  for each phase in support of the Postgres WAL file forwarding
  configuration I posted last week ([Linux-HA] Two node cluster with
  Postgres in WAL file fwding mode, started March 1).


Let me offer this caveat:

Monitor might be called at any time.

Stop should work at any time,
 and succeed harmlessly if it's already stopped.

Start should work at any time,
 and succeed harmlessly if it's already started.

http://www.linux-ha.org/OCFResourceAgent gives more
details.



___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/



___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-21 Thread Doug Knight
Hi Andrew,
I had just started reviewing both of thes scripts, and reviewed the
Multistate and clone resource pages on the web site. It looks like
multistate is how I need to handle it, but a couple of questions first.

1. I noticed that the write-up says the resource must come up on each of
the servers in shadow mode first, then one gets promoted. Does this
imply a start on both servers, and the OCF start function determining
which server is active vs shadow (I'm picturing a check in the OCF
script to determine postgresql standby mode = shadow/crm_master value
low, and postgresql active mode = active/crm_master value high), then a
promote to the active server?

2. I noticed that the drbd OCF script contains a notify function,
where the Stateful OCF script does not. The notify function looks to be
where the important actions are taken (calling drbd_start_phase_2,
pre/post, etc). Is the notify function necessary, or is it sufficient in
my case to handle it through the start|stop|promote|demote functions? 

Thanks for your help,
Doug

On Wed, 2007-03-21 at 10:09 +0100, Andrew Beekhof wrote:
 On 3/20/07, Doug Knight [EMAIL PROTECTED] wrote:
 
   Hi Alan,
   I've had some time to try to implement the OCF script for the PostgreSQL
  WAL file forwarding configuration, and I still am having some issues. The
  main issue is how do I set up an OCF script that allows the process to run
  on both servers, one process in a starting mode ingesting transferred WAL
  files, the other in an accessible mode allowing database access and
  forwarding WAL files to the standby (right now the different states are
  determined within the OCF script by looking for the existence of a
  recovery.conf file, which indicates the starting, standby side)? What I'm
  trying to figure out is how to tell HA that it needs to start the standby
  process and monitor it.
 
   I've started looking at the multi-state configurations, but the
  documentation isn't completely clear. Can you expand on it? I'd like
  something similar to the answers you gave for the monitor, stop, and start
  functions (when do they/can they get invoked, on which node, etc). One
  specific area of the multi-state config I have a question about is how to
  tell HA which server has the master role when first starting up the
  resource. Any additional info would be great.
 
 have a look at the drbd and Stateful agents:

 http://hg.beekhof.net/lha/crm-dev/file/be220f2e9b40/resources/OCF/Stateful.in
 
 but first read:
http://linux-ha.org/v2/Concepts/MultiState
 
 in particular, section 6.1
 
 
   Thanks,
   Doug
 
 
 
 
   On Wed, 2007-03-14 at 19:59 -0600, Alan Robertson wrote:
   Doug Knight wrote:
   Yes, Thanks Alan. Let me digest it, and walk through my OCF script. I'll
   see if I have any other questions after that.
  
   Thanks for getting back to me.
  
   Doug
  
   On Wed, 2007-03-14 at 11:41 -0600, Alan Robertson wrote:
   Doug Knight wrote:
Hi All,
I currently am running a two node cluster (host1 and host2) with
  version
2.0.8. I have a resource defined with a place constraint of #uname eq
host1, so that it will start on host1 (using an OCF RA script,
including all of the required methods). The resource itself has
target_role set to stopped.
   
Question 1: Is the monitor method called regularly on both nodes to
  make
sure the resource is not running?
  
   No. It is called on every node when we first start up (we call it a
   probe operation). If you ask us to, we will run it periodically to
   ensure that a running copy continues to run.
  
   You can also manually request to run this initial probe again to catch
   errors made by system administrators (but I don't know of anyone who
   does that).
  
  
Next, I change the target_role to started (i.e. I use the GUI and
click the start button).
   [Better yet, use the outline start button]
  
Question 2: What is the order of OCF methods called to bring up the
resource? Is Monitor called before Start on host1? Does Stop and/or
Monitor ever get called on host2?
  
   Monitor gets called when we first start up on every node.
   It also get called repeatedly on any node that we think is running it --
   if you ask us to monitor the resource.
  
Resource is up and running on host1, and I decide to move the resource
to host2. I click the constraint and change it to #uname eq host2.
  The
resource stops on host1 and starts on host2.
   
Question 3: Same idea, what are the sequence of method calls to migrate
the resource from host1 to host2?
  
   In the past...
   monitor every resource on every node once to see what's already
   running
   start on some node
   monitor periodically on some node (if requested)
  
   Request to move resource arrives...
   stop monitoring on some node (if it had been requested)
   stop resource on some node
   start resource on some other node
   monitor periodically on some other node (if requested)
  

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-21 Thread Alan Robertson
Doug Knight wrote:
 Hi Andrew,
 I had just started reviewing both of thes scripts, and reviewed the
 Multistate and clone resource pages on the web site. It looks like
 multistate is how I need to handle it, but a couple of questions first.
 
 1. I noticed that the write-up says the resource must come up on each of
 the servers in shadow mode first, then one gets promoted. Does this
 imply a start on both servers, and the OCF start function determining
 which server is active vs shadow (I'm picturing a check in the OCF
 script to determine postgresql standby mode = shadow/crm_master value
 low, and postgresql active mode = active/crm_master value high), then a
 promote to the active server?
 
 2. I noticed that the drbd OCF script contains a notify function,
 where the Stateful OCF script does not. The notify function looks to be
 where the important actions are taken (calling drbd_start_phase_2,
 pre/post, etc). Is the notify function necessary, or is it sufficient in
 my case to handle it through the start|stop|promote|demote functions?
 
 Thanks for your help,
 Doug

Andrew's out for a while.

The start function starts you up in slave/secondary mode.  All resources
initially start up in slave mode.

A set of servers is chosen to run the resources on (it might be one,
two, the whole set, etc. depending on clone_max and clone_node_max and
the usual constraints).

They are started on the selected nodes using start

During the start operation, you are given the chance to declare yourself
ready to become master or not by using the crm_master command line tool.

I believe that your resource can run that command any time they like -
for example at a monitor operation...  But, it is mandatory that they
run it when they first start up.

After this, heartbeat will try and promote as many of these resources as
is consistent with its configured properties, and the crm_master
commands that were run.

The notify command tells you when your peers come and go.  Do you need
to take actions if you know this?

If so, then you need to implement the notify actions...



-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-20 Thread Doug Knight
Hi Alan,
I've had some time to try to implement the OCF script for the PostgreSQL
WAL file forwarding configuration, and I still am having some issues.
The main issue is how do I set up an OCF script that allows the process
to run on both servers, one process in a starting mode ingesting
transferred WAL files, the other in an accessible mode allowing
database access and forwarding WAL files to the standby (right now the
different states are determined within the OCF script by looking for the
existence of a recovery.conf file, which indicates the starting,
standby side)? What I'm trying to figure out is how to tell HA that it
needs to start the standby process and monitor it. 

I've started looking at the multi-state configurations, but the
documentation isn't completely clear. Can you expand on it? I'd like
something similar to the answers you gave for the monitor, stop, and
start functions (when do they/can they get invoked, on which node, etc).
One specific area of the multi-state config I have a question about is
how to tell HA which server has the master role when first starting up
the resource. Any additional info would be great.

Thanks,
Doug



On Wed, 2007-03-14 at 19:59 -0600, Alan Robertson wrote:
 Doug Knight wrote:
  Yes, Thanks Alan. Let me digest it, and walk through my OCF script. I'll
  see if I have any other questions after that.
  
  Thanks for getting back to me.
  
  Doug
  
  On Wed, 2007-03-14 at 11:41 -0600, Alan Robertson wrote:
  Doug Knight wrote:
   Hi All,
   I currently am running a two node cluster (host1 and host2) with version
   2.0.8. I have a resource defined with a place constraint of #uname eq
   host1, so that it will start on host1 (using an OCF RA script,
   including all of the required methods). The resource itself has
   target_role set to stopped.
   
   Question 1: Is the monitor method called regularly on both nodes to make
   sure the resource is not running?
 
  No.  It is called on every node when we first start up (we call it a
  probe operation).  If you ask us to, we will run it periodically to
  ensure that a running copy continues to run.
 
  You can also manually request to run this initial probe again to catch
  errors made by system administrators (but I don't know of anyone who
  does that).
 
 
   Next, I change the target_role to started (i.e. I use the GUI and
   click the start button).
  [Better yet, use the outline start button]
 
   Question 2: What is the order of OCF methods called to bring up the
   resource? Is Monitor called before Start on host1? Does Stop and/or
   Monitor ever get called on host2?
 
  Monitor gets called when we first start up on every node.
  It also get called repeatedly on any node that we think is running it --
  if you ask us to monitor the resource.
 
   Resource is up and running on host1, and I decide to move the resource
   to host2. I click the constraint and change it to #uname eq host2. The
   resource stops on host1 and starts on host2.
   
   Question 3: Same idea, what are the sequence of method calls to migrate
   the resource from host1 to host2?
 
  In the past...
 monitor every resource on every node once to see what's already
 running
 start on some node
 monitor periodically on some node (if requested)
 
  Request to move resource arrives...
 stop monitoring on some node (if it had been requested)
 stop resource on some node
 start resource on some other node
 monitor periodically on some other node (if requested)
 
   I'm trying to thoroughly understand the sequence of events that occurs
   for each phase in support of the Postgres WAL file forwarding
   configuration I posted last week ([Linux-HA] Two node cluster with
   Postgres in WAL file fwding mode, started March 1).
 
 
 Let me offer this caveat:
 
 Monitor might be called at any time.
 
 Stop should work at any time,
   and succeed harmlessly if it's already stopped.
 
 Start should work at any time,
   and succeed harmlessly if it's already started.
 
 http://www.linux-ha.org/OCFResourceAgent gives more details.
 
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-20 Thread Serge Dubrouski

On 3/20/07, Doug Knight [EMAIL PROTECTED] wrote:


 Hi Alan,
 I've had some time to try to implement the OCF script for the PostgreSQL
WAL file forwarding configuration, and I still am having some issues. The
main issue is how do I set up an OCF script that allows the process to run
on both servers, one process in a starting mode ingesting transferred WAL
files, the other in an accessible mode allowing database access and
forwarding WAL files to the standby (right now the different states are
determined within the OCF script by looking for the existence of a
recovery.conf file, which indicates the starting, standby side)? What I'm
trying to figure out is how to tell HA that it needs to start the standby
process and monitor it.


You can have an additional OCF parameter that controls state of the
resource. But you'll need to have something external to set that
parameter into the right state depending on the state of the cluster.



 I've started looking at the multi-state configurations, but the
documentation isn't completely clear. Can you expand on it? I'd like
something similar to the answers you gave for the monitor, stop, and start
functions (when do they/can they get invoked, on which node, etc). One
specific area of the multi-state config I have a question about is how to
tell HA which server has the master role when first starting up the
resource. Any additional info would be great.


On this check this thread:
http://lists.linux-ha.org/pipermail/linux-ha/2007-March/023646.html


 Thanks,
 Doug




 On Wed, 2007-03-14 at 19:59 -0600, Alan Robertson wrote:
 Doug Knight wrote:
 Yes, Thanks Alan. Let me digest it, and walk through my OCF script. I'll
 see if I have any other questions after that.

 Thanks for getting back to me.

 Doug

 On Wed, 2007-03-14 at 11:41 -0600, Alan Robertson wrote:
 Doug Knight wrote:
  Hi All,
  I currently am running a two node cluster (host1 and host2) with
version
  2.0.8. I have a resource defined with a place constraint of #uname eq
  host1, so that it will start on host1 (using an OCF RA script,
  including all of the required methods). The resource itself has
  target_role set to stopped.
 
  Question 1: Is the monitor method called regularly on both nodes to
make
  sure the resource is not running?

 No. It is called on every node when we first start up (we call it a
 probe operation). If you ask us to, we will run it periodically to
 ensure that a running copy continues to run.

 You can also manually request to run this initial probe again to catch
 errors made by system administrators (but I don't know of anyone who
 does that).


  Next, I change the target_role to started (i.e. I use the GUI and
  click the start button).
 [Better yet, use the outline start button]

  Question 2: What is the order of OCF methods called to bring up the
  resource? Is Monitor called before Start on host1? Does Stop and/or
  Monitor ever get called on host2?

 Monitor gets called when we first start up on every node.
 It also get called repeatedly on any node that we think is running it --
 if you ask us to monitor the resource.

  Resource is up and running on host1, and I decide to move the resource
  to host2. I click the constraint and change it to #uname eq host2.
The
  resource stops on host1 and starts on host2.
 
  Question 3: Same idea, what are the sequence of method calls to migrate
  the resource from host1 to host2?

 In the past...
 monitor every resource on every node once to see what's already
 running
 start on some node
 monitor periodically on some node (if requested)

 Request to move resource arrives...
 stop monitoring on some node (if it had been requested)
 stop resource on some node
 start resource on some other node
 monitor periodically on some other node (if requested)

  I'm trying to thoroughly understand the sequence of events that occurs
  for each phase in support of the Postgres WAL file forwarding
  configuration I posted last week ([Linux-HA] Two node cluster with
  Postgres in WAL file fwding mode, started March 1).


Let me offer this caveat:

Monitor might be called at any time.

Stop should work at any time,
 and succeed harmlessly if it's already stopped.

Start should work at any time,
 and succeed harmlessly if it's already started.

http://www.linux-ha.org/OCFResourceAgent gives more
details.



___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/



___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-14 Thread Doug Knight
Hi All,
I currently am running a two node cluster (host1 and host2) with version
2.0.8. I have a resource defined with a place constraint of #uname eq
host1, so that it will start on host1 (using an OCF RA script,
including all of the required methods). The resource itself has
target_role set to stopped. 

Question 1: Is the monitor method called regularly on both nodes to make
sure the resource is not running?

Next, I change the target_role to started (i.e. I use the GUI and
click the start button).

Question 2: What is the order of OCF methods called to bring up the
resource? Is Monitor called before Start on host1? Does Stop and/or
Monitor ever get called on host2? 

Resource is up and running on host1, and I decide to move the resource
to host2. I click the constraint and change it to #uname eq host2. The
resource stops on host1 and starts on host2.

Question 3: Same idea, what are the sequence of method calls to migrate
the resource from host1 to host2?

I'm trying to thoroughly understand the sequence of events that occurs
for each phase in support of the Postgres WAL file forwarding
configuration I posted last week ([Linux-HA] Two node cluster with
Postgres in WAL file fwding mode, started March 1).

Thanks,
Doug Knight
WSI Inc.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-14 Thread Alan Robertson
Doug Knight wrote:
 Hi All,
 I currently am running a two node cluster (host1 and host2) with version
 2.0.8. I have a resource defined with a place constraint of #uname eq
 host1, so that it will start on host1 (using an OCF RA script,
 including all of the required methods). The resource itself has
 target_role set to stopped.
 
 Question 1: Is the monitor method called regularly on both nodes to make
 sure the resource is not running?

No.  It is called on every node when we first start up (we call it a
probe operation).  If you ask us to, we will run it periodically to
ensure that a running copy continues to run.

You can also manually request to run this initial probe again to catch
errors made by system administrators (but I don't know of anyone who
does that).


 Next, I change the target_role to started (i.e. I use the GUI and
 click the start button).
[Better yet, use the outline start button]

 Question 2: What is the order of OCF methods called to bring up the
 resource? Is Monitor called before Start on host1? Does Stop and/or
 Monitor ever get called on host2?

Monitor gets called when we first start up on every node.
It also get called repeatedly on any node that we think is running it --
if you ask us to monitor the resource.

 Resource is up and running on host1, and I decide to move the resource
 to host2. I click the constraint and change it to #uname eq host2. The
 resource stops on host1 and starts on host2.
 
 Question 3: Same idea, what are the sequence of method calls to migrate
 the resource from host1 to host2?

In the past...
monitor every resource on every node once to see what's already
running
start on some node
monitor periodically on some node (if requested)

Request to move resource arrives...
stop monitoring on some node (if it had been requested)
stop resource on some node
start resource on some other node
monitor periodically on some other node (if requested)

 I'm trying to thoroughly understand the sequence of events that occurs
 for each phase in support of the Postgres WAL file forwarding
 configuration I posted last week ([Linux-HA] Two node cluster with
 Postgres in WAL file fwding mode, started March 1).


Does that help?


-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-14 Thread Doug Knight
Yes, Thanks Alan. Let me digest it, and walk through my OCF script. I'll
see if I have any other questions after that.

Thanks for getting back to me.

Doug

On Wed, 2007-03-14 at 11:41 -0600, Alan Robertson wrote:
 Doug Knight wrote:
  Hi All,
  I currently am running a two node cluster (host1 and host2) with version
  2.0.8. I have a resource defined with a place constraint of #uname eq
  host1, so that it will start on host1 (using an OCF RA script,
  including all of the required methods). The resource itself has
  target_role set to stopped.
  
  Question 1: Is the monitor method called regularly on both nodes to make
  sure the resource is not running?
 
 No.  It is called on every node when we first start up (we call it a
 probe operation).  If you ask us to, we will run it periodically to
 ensure that a running copy continues to run.
 
 You can also manually request to run this initial probe again to catch
 errors made by system administrators (but I don't know of anyone who
 does that).
 
 
  Next, I change the target_role to started (i.e. I use the GUI and
  click the start button).
 [Better yet, use the outline start button]
 
  Question 2: What is the order of OCF methods called to bring up the
  resource? Is Monitor called before Start on host1? Does Stop and/or
  Monitor ever get called on host2?
 
 Monitor gets called when we first start up on every node.
 It also get called repeatedly on any node that we think is running it --
 if you ask us to monitor the resource.
 
  Resource is up and running on host1, and I decide to move the resource
  to host2. I click the constraint and change it to #uname eq host2. The
  resource stops on host1 and starts on host2.
  
  Question 3: Same idea, what are the sequence of method calls to migrate
  the resource from host1 to host2?
 
 In the past...
   monitor every resource on every node once to see what's already
   running
   start on some node
   monitor periodically on some node (if requested)
 
 Request to move resource arrives...
   stop monitoring on some node (if it had been requested)
   stop resource on some node
   start resource on some other node
   monitor periodically on some other node (if requested)
 
  I'm trying to thoroughly understand the sequence of events that occurs
  for each phase in support of the Postgres WAL file forwarding
  configuration I posted last week ([Linux-HA] Two node cluster with
  Postgres in WAL file fwding mode, started March 1).
 
 
 Does that help?
 
 
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

2007-03-14 Thread Alan Robertson
Doug Knight wrote:
 Yes, Thanks Alan. Let me digest it, and walk through my OCF script. I'll
 see if I have any other questions after that.
 
 Thanks for getting back to me.
 
 Doug
 
 On Wed, 2007-03-14 at 11:41 -0600, Alan Robertson wrote:
 Doug Knight wrote:
  Hi All,
  I currently am running a two node cluster (host1 and host2) with version
  2.0.8. I have a resource defined with a place constraint of #uname eq
  host1, so that it will start on host1 (using an OCF RA script,
  including all of the required methods). The resource itself has
  target_role set to stopped.
  
  Question 1: Is the monitor method called regularly on both nodes to make
  sure the resource is not running?

 No.  It is called on every node when we first start up (we call it a
 probe operation).  If you ask us to, we will run it periodically to
 ensure that a running copy continues to run.

 You can also manually request to run this initial probe again to catch
 errors made by system administrators (but I don't know of anyone who
 does that).


  Next, I change the target_role to started (i.e. I use the GUI and
  click the start button).
 [Better yet, use the outline start button]

  Question 2: What is the order of OCF methods called to bring up the
  resource? Is Monitor called before Start on host1? Does Stop and/or
  Monitor ever get called on host2?

 Monitor gets called when we first start up on every node.
 It also get called repeatedly on any node that we think is running it --
 if you ask us to monitor the resource.

  Resource is up and running on host1, and I decide to move the resource
  to host2. I click the constraint and change it to #uname eq host2. The
  resource stops on host1 and starts on host2.
  
  Question 3: Same idea, what are the sequence of method calls to migrate
  the resource from host1 to host2?

 In the past...
  monitor every resource on every node once to see what's already
  running
  start on some node
  monitor periodically on some node (if requested)

 Request to move resource arrives...
  stop monitoring on some node (if it had been requested)
  stop resource on some node
  start resource on some other node
  monitor periodically on some other node (if requested)

  I'm trying to thoroughly understand the sequence of events that occurs
  for each phase in support of the Postgres WAL file forwarding
  configuration I posted last week ([Linux-HA] Two node cluster with
  Postgres in WAL file fwding mode, started March 1).


Let me offer this caveat:

Monitor might be called at any time.

Stop should work at any time,
and succeed harmlessly if it's already stopped.

Start should work at any time,
and succeed harmlessly if it's already started.

http://www.linux-ha.org/OCFResourceAgent gives more details.

-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/