Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
On 4/11/07, Alan Robertson [EMAIL PROTECTED] wrote: Lars Marowsky-Bree wrote: On 2007-04-10T07:09:44, Alan Robertson [EMAIL PROTECTED] wrote: As even calling crm_master and having it do a compare and update-if-modified, or filtering it in the CIB directly requires to at least contact and query the CIB, I'd probably still track the state in the RA somewhere. (As to avoid forking and IPC.) Keeping track of it in the RA would typically involve extra forking to do the query, and comparision, and also to manage the state in tmp files, etc. Uhm. Forking? echo, read, and even sourcing the file (if it's written in VAR=VALUE style) doesn't incur that overhead. Anyway, Andrew fixed it in the CIB by now, so this part of the discussion is mood ;-) Or even better, monitoring drbd via a daemon which sends an async notification (either a crm_master change or a async failure notification) when something happens, instead of polling via the LRM. I'd wish to have a fast LRM interface, where the instantiation of a resource starts a sub-daemon to control it and then manage it via IPC (or maybe stdin/stdout) with that process - and, if the daemon support it, have it do async monitoring as well. Invoking the LRM to update the CIB isn't more efficient than just updating the CIB. That is not what this suggestion is about. Try reparsing it ;-) Such a daemon would obviously easily keep internal state and only issue CIB updates or async (failure) notifications as needed. It'd avoid a lot of forking for various operations. You can already do that. Resources often have processes associated with them. They can update the CIB any time they like. What those processes do, we don't know, and we don't much care. Async LRM failure notifications (as failed async monitors) are not there yet, but there was something about this in the previous discussion that I've forgotten :-(. Actually if you look at CTS, you'll find that ResourceRecover uses a-sync notifications (albeit with a couple of tricks to work around deficiencies in the lrm) ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
On 2007-04-10T16:47:52, Alan Robertson [EMAIL PROTECTED] wrote: Such a daemon would obviously easily keep internal state and only issue CIB updates or async (failure) notifications as needed. It'd avoid a lot of forking for various operations. You can already do that. Resources often have processes associated with them. They can update the CIB any time they like. What those processes do, we don't know, and we don't much care. No, I cannot do what I described. The fast LRM interface isn't there yet - what you describe above is only a subset of the feature I want. Async LRM failure notifications (as failed async monitors) are not there yet, but there was something about this in the previous discussion that I've forgotten :-(. I thought they worked by now? Sincerely, Lars -- Teamlead Kernel, SuSE Labs, Research and Development SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
On 2007-04-11T09:40:52, Andrew Beekhof [EMAIL PROTECTED] wrote: Actually if you look at CTS, you'll find that ResourceRecover uses a-sync notifications (albeit with a couple of tricks to work around deficiencies in the lrm) Have bugs been filed for those deficiencies so that Dejan doesn't have to get bored? ;-) Regards, Lars -- Teamlead Kernel, SuSE Labs, Research and Development SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: crm_master patch to eliminate do-nothing attribute updates - WAS Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
On Apr 5, 2007, at 4:48 PM, Alan Robertson wrote: Alan Robertson wrote: Lars Marowsky-Bree wrote: On 2007-04-04T11:41:44, Doug Knight [EMAIL PROTECTED] wrote: The key word in my question was thinks. It would be useful to the RA if it could know what state the CRM thought it was in, so in case the RA determines on its own that its already in that state, it doesn't have to do anything. But, if the RA finds that the CRM thinks its in a different state, then the RA could set the CRM straight by calling the crm_master with the appropriate value. Make sense? No. The state the resource is in is not set via crm_master, but using the exit code of the monitor operation. You should only call crm_master when you wish to change the _preference_ for master-state. But, I think you can use crm_master to retrieve your current preference, and thereby eliminate unnecessary CIB updates. Or maybe crm_master should do that filtering on its own?? Attached is a straightforward patch to crm_attribute.c which I believe does this... It also eliminates certain other do-nothing attribute changes - because this code is shared by a few different commands. I recognize that this is slightly less efficient for those cases where the attribute is actually going to be changed, but it is _vastly_ more efficient for those cases where the current value is correct - because it eliminates triggering a computation cycle of the policy engine. I'm not sure how I missed this previously, but this statement is just not true. If the only change to the CIB is to num_updates, then the PE is never re-invoked. This has been the case for about as long as I can remember. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: crm_master patch to eliminate do-nothing attribute updates - WAS Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Andrew Beekhof wrote: On Apr 5, 2007, at 4:48 PM, Alan Robertson wrote: Alan Robertson wrote: Lars Marowsky-Bree wrote: On 2007-04-04T11:41:44, Doug Knight [EMAIL PROTECTED] wrote: The key word in my question was thinks. It would be useful to the RA if it could know what state the CRM thought it was in, so in case the RA determines on its own that its already in that state, it doesn't have to do anything. But, if the RA finds that the CRM thinks its in a different state, then the RA could set the CRM straight by calling the crm_master with the appropriate value. Make sense? No. The state the resource is in is not set via crm_master, but using the exit code of the monitor operation. You should only call crm_master when you wish to change the _preference_ for master-state. But, I think you can use crm_master to retrieve your current preference, and thereby eliminate unnecessary CIB updates. Or maybe crm_master should do that filtering on its own?? Attached is a straightforward patch to crm_attribute.c which I believe does this... It also eliminates certain other do-nothing attribute changes - because this code is shared by a few different commands. I recognize that this is slightly less efficient for those cases where the attribute is actually going to be changed, but it is _vastly_ more efficient for those cases where the current value is correct - because it eliminates triggering a computation cycle of the policy engine. I'm not sure how I missed this previously, but this statement is just not true. If the only change to the CIB is to num_updates, then the PE is never re-invoked. This has been the case for about as long as I can remember. This whole discussion started with evidence that this wasn't happening. I didn't know it was _supposed_ to happen, so I didn't think it odd. -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Lars Marowsky-Bree wrote: On 2007-04-05T07:40:34, Alan Robertson [EMAIL PROTECTED] wrote: That is why I'd suggest to only call it in start or post-notify; calling it in post-notify basically implies it'll be called after every state change. But, for DRBD for example, the ability to become master can change without a heartbeat state change. I didn't say it was perfect ;-) As even calling crm_master and having it do a compare and update-if-modified, or filtering it in the CIB directly requires to at least contact and query the CIB, I'd probably still track the state in the RA somewhere. (As to avoid forking and IPC.) Keeping track of it in the RA would typically involve extra forking to do the query, and comparision, and also to manage the state in tmp files, etc. Or even better, monitoring drbd via a daemon which sends an async notification (either a crm_master change or a async failure notification) when something happens, instead of polling via the LRM. I'd wish to have a fast LRM interface, where the instantiation of a resource starts a sub-daemon to control it and then manage it via IPC (or maybe stdin/stdout) with that process - and, if the daemon support it, have it do async monitoring as well. Invoking the LRM to update the CIB isn't more efficient than just updating the CIB. -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
On 2007-04-10T07:09:44, Alan Robertson [EMAIL PROTECTED] wrote: As even calling crm_master and having it do a compare and update-if-modified, or filtering it in the CIB directly requires to at least contact and query the CIB, I'd probably still track the state in the RA somewhere. (As to avoid forking and IPC.) Keeping track of it in the RA would typically involve extra forking to do the query, and comparision, and also to manage the state in tmp files, etc. Uhm. Forking? echo, read, and even sourcing the file (if it's written in VAR=VALUE style) doesn't incur that overhead. Anyway, Andrew fixed it in the CIB by now, so this part of the discussion is mood ;-) Or even better, monitoring drbd via a daemon which sends an async notification (either a crm_master change or a async failure notification) when something happens, instead of polling via the LRM. I'd wish to have a fast LRM interface, where the instantiation of a resource starts a sub-daemon to control it and then manage it via IPC (or maybe stdin/stdout) with that process - and, if the daemon support it, have it do async monitoring as well. Invoking the LRM to update the CIB isn't more efficient than just updating the CIB. That is not what this suggestion is about. Try reparsing it ;-) Such a daemon would obviously easily keep internal state and only issue CIB updates or async (failure) notifications as needed. It'd avoid a lot of forking for various operations. Sincerely, Lars -- Teamlead Kernel, SuSE Labs, Research and Development SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Lars Marowsky-Bree wrote: On 2007-04-10T07:09:44, Alan Robertson [EMAIL PROTECTED] wrote: As even calling crm_master and having it do a compare and update-if-modified, or filtering it in the CIB directly requires to at least contact and query the CIB, I'd probably still track the state in the RA somewhere. (As to avoid forking and IPC.) Keeping track of it in the RA would typically involve extra forking to do the query, and comparision, and also to manage the state in tmp files, etc. Uhm. Forking? echo, read, and even sourcing the file (if it's written in VAR=VALUE style) doesn't incur that overhead. Anyway, Andrew fixed it in the CIB by now, so this part of the discussion is mood ;-) Or even better, monitoring drbd via a daemon which sends an async notification (either a crm_master change or a async failure notification) when something happens, instead of polling via the LRM. I'd wish to have a fast LRM interface, where the instantiation of a resource starts a sub-daemon to control it and then manage it via IPC (or maybe stdin/stdout) with that process - and, if the daemon support it, have it do async monitoring as well. Invoking the LRM to update the CIB isn't more efficient than just updating the CIB. That is not what this suggestion is about. Try reparsing it ;-) Such a daemon would obviously easily keep internal state and only issue CIB updates or async (failure) notifications as needed. It'd avoid a lot of forking for various operations. You can already do that. Resources often have processes associated with them. They can update the CIB any time they like. What those processes do, we don't know, and we don't much care. Async LRM failure notifications (as failed async monitors) are not there yet, but there was something about this in the previous discussion that I've forgotten :-(. -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
On 2007-04-05T07:40:34, Alan Robertson [EMAIL PROTECTED] wrote: That is why I'd suggest to only call it in start or post-notify; calling it in post-notify basically implies it'll be called after every state change. But, for DRBD for example, the ability to become master can change without a heartbeat state change. I didn't say it was perfect ;-) As even calling crm_master and having it do a compare and update-if-modified, or filtering it in the CIB directly requires to at least contact and query the CIB, I'd probably still track the state in the RA somewhere. (As to avoid forking and IPC.) Or even better, monitoring drbd via a daemon which sends an async notification (either a crm_master change or a async failure notification) when something happens, instead of polling via the LRM. I'd wish to have a fast LRM interface, where the instantiation of a resource starts a sub-daemon to control it and then manage it via IPC (or maybe stdin/stdout) with that process - and, if the daemon support it, have it do async monitoring as well. Hmmm. I think we have a bugzilla for that since ages, and now we have a full-time LRM maintainer again as well ;-) Among with the LRM-needs-to-track-timestamps, this is probably one of the most important features I'm missing in the LRM ... (The CIB modification to filter out unnecessary changes is of course good in any case.) Sincerely, Lars -- Teamlead Kernel, SuSE Labs, Research and Development SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Hi Alan, Interesting. This entry in the cib was completely created from the GUI. I modified the Transition Timeout from the GUI to get around the timeout of my notify function call. So if I understand what you're saying, I should remove the tags from the op definition and create nvpairs to replace them. By the way, I checked out the link you sent with the GUI information. Unfortunately the system I use for the web and email is also the server on which we run our apps, and it has no sound ability at all (I'm probably lucky its not headless ;) I'll check your page out over the weekend from my laptop. Doug On Thu, 2007-04-05 at 12:36 -0600, Alan Robertson wrote: Doug Knight wrote: Here's another thing I'm seeing with the notify function. It keeps timing out on my slave startup (post-promote-notify and post-start-notify). I'm triggering the start by using the cleanup resources button on the GUI for the slave's resource: crmd[13681]: 2007/04/05_13:16:47 ERROR: process_lrm_event: LRM operation rsc_pgsql_wal_5556:0_notify_0 (74) Timed Out (timeout=5000ms) Yet I have the timeout value for the notify function bumped up to 120: op id=1e9ebbb8-a370-4d1e-8815-43d6f0fdcfd6 name=notify timeout=120 start_delay=0 disabled=false role= Started/ I believe this is incorrect. The timeout, start_delay and role are meta-attributes which need nvpairs. (The gui is confused about role - it thinks its a parameter). ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Doug Knight wrote: Hi Alan, Interesting. This entry in the cib was completely created from the GUI. I modified the Transition Timeout from the GUI to get around the timeout of my notify function call. So if I understand what you're saying, I should remove the tags from the op definition and create nvpairs to replace them. This is described in my tutorial and also in the DTD. By the way, I checked out the link you sent with the GUI information. Unfortunately the system I use for the web and email is also the server on which we run our apps, and it has no sound ability at all (I'm probably lucky its not headless ;) I'll check your page out over the weekend from my laptop. What version of the GUI created it that way? I know some versions create it as a parameter with nvpairs, but I didn't know any created it as tags in the way you showed... -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Alan Robertson wrote: Doug Knight wrote: Hi Alan, Interesting. This entry in the cib was completely created from the GUI. I modified the Transition Timeout from the GUI to get around the timeout of my notify function call. So if I understand what you're saying, I should remove the tags from the op definition and create nvpairs to replace them. This is described in my tutorial and also in the DTD. I meant the first tutorial on this page: http://linux-ha.org/HeartbeatTutorials -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
I've asked this question before, but have not gotten an answer: Can you have a location constraint to set the resource to prefer running the master on one node vs the other (and still have the slave running too)? On Fri, 2007-04-06 at 09:46 -0400, Doug Knight wrote: Not sure how to determine the version of the GUI. It came with the 2.0.8 release. The prolog says copyright 2005. I've looked at the DTD and its not always clear what its saying, but it helps. What's really helped is stealing the do_cmd function from the drbd OCF script, and placing a few targeted env commands in some of my functions. Though all the info I gather this way is also on the web site, for me I understand it better when I see it in action. I did see and review the tutorial you referred to in your other email. I checked it out prior to getting started with Linux-HA. Maybe its time I revisited it ;) Doug On Fri, 2007-04-06 at 07:32 -0600, Alan Robertson wrote: Doug Knight wrote: Hi Alan, Interesting. This entry in the cib was completely created from the GUI. I modified the Transition Timeout from the GUI to get around the timeout of my notify function call. So if I understand what you're saying, I should remove the tags from the op definition and create nvpairs to replace them. This is described in my tutorial and also in the DTD. By the way, I checked out the link you sent with the GUI information. Unfortunately the system I use for the web and email is also the server on which we run our apps, and it has no sound ability at all (I'm probably lucky its not headless ;) I'll check your page out over the weekend from my laptop. What version of the GUI created it that way? I know some versions create it as a parameter with nvpairs, but I didn't know any created it as tags in the way you showed... ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
OK, here's another strange happening, I had an IPaddr resource with a separate (non-WAL file forwarding instance) database co-located to it. I created this early on to get up to speed on the HA basics. It uses the heartbeat ocf script, as does IPaddr. I used a places constraint with a uname eq to move it manually between nodes in the cluster. As I was testing my multistate OCF script for the WAL file forwarding version, I decided to try moving the IPaddr resource from one node to another (I've been having some issues in my fail-over testing, so I thought I'd verify that the IPaddr handling still worked). When the IPaddr moved, it affected the WAL file forwarding resource as well. I haven't dug into the logs yet to see what happened, but the question becomes why would other resources on a node be affected when a change is made to a completely independent resource? Does it have something to do with managing everything from the GUI? Or making changes to the cib.xml (which I only do through the GUI right now)? I've also noticed that the monitoring of the resources does not appear to be happening. I killed the master database using the original LSB script and waited for the HA resource monitoring to pick it up and restart, but it didn't. I'm probably missing something simple here... Doug On Fri, 2007-04-06 at 10:24 -0400, Doug Knight wrote: I've asked this question before, but have not gotten an answer: Can you have a location constraint to set the resource to prefer running the master on one node vs the other (and still have the slave running too)? On Fri, 2007-04-06 at 09:46 -0400, Doug Knight wrote: Not sure how to determine the version of the GUI. It came with the 2.0.8 release. The prolog says copyright 2005. I've looked at the DTD and its not always clear what its saying, but it helps. What's really helped is stealing the do_cmd function from the drbd OCF script, and placing a few targeted env commands in some of my functions. Though all the info I gather this way is also on the web site, for me I understand it better when I see it in action. I did see and review the tutorial you referred to in your other email. I checked it out prior to getting started with Linux-HA. Maybe its time I revisited it ;) Doug On Fri, 2007-04-06 at 07:32 -0600, Alan Robertson wrote: Doug Knight wrote: Hi Alan, Interesting. This entry in the cib was completely created from the GUI. I modified the Transition Timeout from the GUI to get around the timeout of my notify function call. So if I understand what you're saying, I should remove the tags from the op definition and create nvpairs to replace them. This is described in my tutorial and also in the DTD. By the way, I checked out the link you sent with the GUI information. Unfortunately the system I use for the web and email is also the server on which we run our apps, and it has no sound ability at all (I'm probably lucky its not headless ;) I'll check your page out over the weekend from my laptop. What version of the GUI created it that way? I know some versions create it as a parameter with nvpairs, but I didn't know any created it as tags in the way you showed... ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Doug Knight wrote: Not sure how to determine the version of the GUI. It came with the 2.0.8 release. The prolog says copyright 2005. I've looked at the DTD and its not always clear what its saying, but it helps. What's really helped is stealing the do_cmd function from the drbd OCF script, and placing a few targeted env commands in some of my functions. Though all the info I gather this way is also on the web site, for me I understand it better when I see it in action. I did see and review the tutorial you referred to in your other email. I checked it out prior to getting started with Linux-HA. Maybe its time I revisited it ;) I'm sure you're fine ;-) I didn't think the 2.0.8 GUI did that. (telling the version of heartbeat is enough). -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
On 4/4/07, Alan Robertson [EMAIL PROTECTED] wrote: Doug Knight wrote: I think I ran into this exact issue. I was calling crm_master -v 100 to upgrade to master status too frequently. The postgres database needs to stay in master state once its there, and not transition frequently. Are there any other command line tools that could be used to retrieve what state CRM thinks the resource is in? Doug On Tue, 2007-04-03 at 22:05 -0600, Alan Robertson wrote: Lars Marowsky-Bree wrote: On 2007-03-29T21:31:32, Alan Robertson [EMAIL PROTECTED] wrote: THe docs are more specific: They state that it is mandatory to call it in the start action. That doesn't seem to be true. That's a correct observation. You could call it in a monitor action quite nicely, or so it seems to me... Yes. I'd avoid needless calls to it though as every call triggers a transition - so don't just call it in _every_ monitor operation ;-) That's a good point. Unfortunately that means the RA has to keep state somewhere, which is a bit of a pain. There is an area where you can create tmp files for your RA. It gets cleaned out when heartbeat starts. It's there for this exact reason. I think crm_master can read the preference when given the right flag. -G ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
On 2007-04-03T22:05:24, Alan Robertson [EMAIL PROTECTED] wrote: Yes. I'd avoid needless calls to it though as every call triggers a transition - so don't just call it in _every_ monitor operation ;-) That's a good point. Unfortunately that means the RA has to keep state somewhere, which is a bit of a pain. That is why I'd suggest to only call it in start or post-notify; calling it in post-notify basically implies it'll be called after every state change. Sincerely, Lars -- Teamlead Kernel, SuSE Labs, Research and Development SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
On 2007-04-04T11:41:44, Doug Knight [EMAIL PROTECTED] wrote: The key word in my question was thinks. It would be useful to the RA if it could know what state the CRM thought it was in, so in case the RA determines on its own that its already in that state, it doesn't have to do anything. But, if the RA finds that the CRM thinks its in a different state, then the RA could set the CRM straight by calling the crm_master with the appropriate value. Make sense? No. The state the resource is in is not set via crm_master, but using the exit code of the monitor operation. You should only call crm_master when you wish to change the _preference_ for master-state. Sincerely, Lars -- Teamlead Kernel, SuSE Labs, Research and Development SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Lars Marowsky-Bree wrote: On 2007-04-04T11:41:44, Doug Knight [EMAIL PROTECTED] wrote: The key word in my question was thinks. It would be useful to the RA if it could know what state the CRM thought it was in, so in case the RA determines on its own that its already in that state, it doesn't have to do anything. But, if the RA finds that the CRM thinks its in a different state, then the RA could set the CRM straight by calling the crm_master with the appropriate value. Make sense? No. The state the resource is in is not set via crm_master, but using the exit code of the monitor operation. You should only call crm_master when you wish to change the _preference_ for master-state. But, I think you can use crm_master to retrieve your current preference, and thereby eliminate unnecessary CIB updates. Or maybe crm_master should do that filtering on its own?? I like that thought... -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Lars Marowsky-Bree wrote: On 2007-04-03T22:05:24, Alan Robertson [EMAIL PROTECTED] wrote: Yes. I'd avoid needless calls to it though as every call triggers a transition - so don't just call it in _every_ monitor operation ;-) That's a good point. Unfortunately that means the RA has to keep state somewhere, which is a bit of a pain. That is why I'd suggest to only call it in start or post-notify; calling it in post-notify basically implies it'll be called after every state change. But, for DRBD for example, the ability to become master can change without a heartbeat state change. This is not at all surprising, nor is it probably uncommon. -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
I think I ran into this exact issue. I was calling crm_master -v 100 to upgrade to master status too frequently. The postgres database needs to stay in master state once its there, and not transition frequently. Are there any other command line tools that could be used to retrieve what state CRM thinks the resource is in? Doug On Tue, 2007-04-03 at 22:05 -0600, Alan Robertson wrote: Lars Marowsky-Bree wrote: On 2007-03-29T21:31:32, Alan Robertson [EMAIL PROTECTED] wrote: THe docs are more specific: They state that it is mandatory to call it in the start action. That doesn't seem to be true. That's a correct observation. You could call it in a monitor action quite nicely, or so it seems to me... Yes. I'd avoid needless calls to it though as every call triggers a transition - so don't just call it in _every_ monitor operation ;-) That's a good point. Unfortunately that means the RA has to keep state somewhere, which is a bit of a pain. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
On 4/3/07, Doug Knight [EMAIL PROTECTED] wrote: Thanks. Yes, I'm concerned that I'm seeing the spinning and your not. However, I need to push through to getting my OCF script working, and using my version of crm_master will allow me to do that. I'll have to take the added demote into consideration, as when executing a straight stop this will interfere with an intentional shutdown of the master. I found part of my problem with the slave not starting. I had to change clone_max from 1 to 2. Now at least its trying to start the slave, but I'm getting a strange error in the log when trying to start the slave: ERROR: find_attr_details: Multiple attributes match name=master-pgsql_wal_5556:0 in nodes: this is because you inverted the searching I've attached the cib.xml from the slave node as well. It looks like my experimenting with my scripts manually has placed some spurious nvpairs in the xml? Suggestions? Also, can you have a location constraint to set the resource to prefer running the master on one node vs the other? Doug On Tue, 2007-04-03 at 17:50 +0200, Andrew Beekhof wrote: On 4/3/07, Doug Knight [EMAIL PROTECTED] wrote: I've never done a bugzilla report, maybe you can point me in the right direction. http://old.linux-foundation.org/developer_bugzilla/enter_bug.cgi In the mean time I'll continue to test with my modified version of crm_master that doesn't spin. my concerns are that a) it works for us as-is, and b) it now works for you but for unknown reasons. I've managed to get the master side of the process configuration up, but am having trouble getting the slave side up. I did notice that when I select Stop from the GUI to completely stop the master, it demotes it first then stop it? The demote is what I use to reconfigure a master going down to become the slave after the new master comes up. Is there a way in demote to determine its been triggered by a Stop request vs a Demote? if you're a master, you'll always get a demote before a stop. Doug On Tue, 2007-04-03 at 12:28 +0200, Andrew Beekhof wrote: Hi Doug, I just tried to reproduce this (loaded your cib and ran crm_master the same way you did) but it worked fine. Can you open a bug in bugzilla and include the result of cibadmin -Q please? On 4/2/07, Doug Knight [EMAIL PROTECTED] wrote: Andrew, Alan, Lars, et al, Any updates on the spinning crm_master? I and an associate of mine here are looking into compiler flags and settings to see if we can find anything there. If there is more info you need, just let me know. Doug On Fri, 2007-03-30 at 11:15 -0400, Doug Knight wrote: I've done some more looking at the cib_attrs.c module find_attr_details function, and it seems that the call to find_xml_children filtering by node eventually finds a match (match_found = 1 in xml.c/find_xml_children) terminating the recursive search. Later in the module, the call to find_xml_children filtering by name gets a match but continues to search. I never see the search by set: in the debug log, nor the printf I've placed in my version right after the call to find_xml_children. In fact, it seems that somehow find_xml_children loops within itself, though I must really be missing something there. Doug On Fri, 2007-03-30 at 08:20 -0400, Doug Knight wrote: Andrew, FYI, the one I ran has some debug statements in it that I put in there. Let me know if you want cleaner output... Doug On Fri, 2007-03-30 at 07:44 -0400, Doug Knight wrote: Hi Andrew, I'll get to this today. I've been digging through some of the source, putting in some additional logging, etc, just to see where the problem occurs. Thanks, Doug On Fri, 2007-03-30 at 10:32 +0200, Andrew Beekhof wrote: On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote: Results: Using the dev version or crm_master I still spin at the top of the CPU stack. What's next? probably time to log a bug... can you include the logs of the crm_master command when you add - (an insance amount of logging) to your normal command line and your current CIB please Doug On Thu, 2007-03-29 at 10:06 -0400, Doug Knight wrote: OK, ConfigureMe make is complete. As when I built the baseline 2.0.8, I had two compile errors, and had to make some minor source code mods (see attached error message and file diffs). Not sure if this is related or not. I've done a make install in the crm/admin directory to replace the crm_master used by the system. I'll test that out in a bit. Doug On Thu, 2007-03-29 at 15:38 +0200, Andrew Beekhof wrote: On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote: I went ahead and installed via RPM automake, autoconf, and libtool, even though they were not needed for the 2.0.8 baseline build (I believe libtool at least may have been installed previously). ConfigureMe bootstrap ran fine. From the
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
The key word in my question was thinks. It would be useful to the RA if it could know what state the CRM thought it was in, so in case the RA determines on its own that its already in that state, it doesn't have to do anything. But, if the RA finds that the CRM thinks its in a different state, then the RA could set the CRM straight by calling the crm_master with the appropriate value. Make sense? Doug On Wed, 2007-04-04 at 16:02 +0200, Andrew Beekhof wrote: On 4/4/07, Doug Knight [EMAIL PROTECTED] wrote: I think I ran into this exact issue. I was calling crm_master -v 100 to upgrade to master status too frequently. The postgres database needs to stay in master state once its there, and not transition frequently. Are there any other command line tools that could be used to retrieve what state CRM thinks the resource is in? definitly not - because that completely defeats the point of having a monitor operation the RA needs to tell us what state the resource is in, never, ever the other way around. Doug On Tue, 2007-04-03 at 22:05 -0600, Alan Robertson wrote: Lars Marowsky-Bree wrote: On 2007-03-29T21:31:32, Alan Robertson [EMAIL PROTECTED] wrote: THe docs are more specific: They state that it is mandatory to call it in the start action. That doesn't seem to be true. That's a correct observation. You could call it in a monitor action quite nicely, or so it seems to me... Yes. I'd avoid needless calls to it though as every call triggers a transition - so don't just call it in _every_ monitor operation ;-) That's a good point. Unfortunately that means the RA has to keep state somewhere, which is a bit of a pain. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Doug Knight wrote: I think I ran into this exact issue. I was calling crm_master -v 100 to upgrade to master status too frequently. The postgres database needs to stay in master state once its there, and not transition frequently. Are there any other command line tools that could be used to retrieve what state CRM thinks the resource is in? Doug On Tue, 2007-04-03 at 22:05 -0600, Alan Robertson wrote: Lars Marowsky-Bree wrote: On 2007-03-29T21:31:32, Alan Robertson [EMAIL PROTECTED] wrote: THe docs are more specific: They state that it is mandatory to call it in the start action. That doesn't seem to be true. That's a correct observation. You could call it in a monitor action quite nicely, or so it seems to me... Yes. I'd avoid needless calls to it though as every call triggers a transition - so don't just call it in _every_ monitor operation ;-) That's a good point. Unfortunately that means the RA has to keep state somewhere, which is a bit of a pain. There is an area where you can create tmp files for your RA. It gets cleaned out when heartbeat starts. It's there for this exact reason. I think crm_master can read the preference when given the right flag. -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Hi Doug, I just tried to reproduce this (loaded your cib and ran crm_master the same way you did) but it worked fine. Can you open a bug in bugzilla and include the result of cibadmin -Q please? On 4/2/07, Doug Knight [EMAIL PROTECTED] wrote: Andrew, Alan, Lars, et al, Any updates on the spinning crm_master? I and an associate of mine here are looking into compiler flags and settings to see if we can find anything there. If there is more info you need, just let me know. Doug On Fri, 2007-03-30 at 11:15 -0400, Doug Knight wrote: I've done some more looking at the cib_attrs.c module find_attr_details function, and it seems that the call to find_xml_children filtering by node eventually finds a match (match_found = 1 in xml.c/find_xml_children) terminating the recursive search. Later in the module, the call to find_xml_children filtering by name gets a match but continues to search. I never see the search by set: in the debug log, nor the printf I've placed in my version right after the call to find_xml_children. In fact, it seems that somehow find_xml_children loops within itself, though I must really be missing something there. Doug On Fri, 2007-03-30 at 08:20 -0400, Doug Knight wrote: Andrew, FYI, the one I ran has some debug statements in it that I put in there. Let me know if you want cleaner output... Doug On Fri, 2007-03-30 at 07:44 -0400, Doug Knight wrote: Hi Andrew, I'll get to this today. I've been digging through some of the source, putting in some additional logging, etc, just to see where the problem occurs. Thanks, Doug On Fri, 2007-03-30 at 10:32 +0200, Andrew Beekhof wrote: On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote: Results: Using the dev version or crm_master I still spin at the top of the CPU stack. What's next? probably time to log a bug... can you include the logs of the crm_master command when you add - (an insance amount of logging) to your normal command line and your current CIB please Doug On Thu, 2007-03-29 at 10:06 -0400, Doug Knight wrote: OK, ConfigureMe make is complete. As when I built the baseline 2.0.8, I had two compile errors, and had to make some minor source code mods (see attached error message and file diffs). Not sure if this is related or not. I've done a make install in the crm/admin directory to replace the crm_master used by the system. I'll test that out in a bit. Doug On Thu, 2007-03-29 at 15:38 +0200, Andrew Beekhof wrote: On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote: I went ahead and installed via RPM automake, autoconf, and libtool, even though they were not needed for the 2.0.8 baseline build (I believe libtool at least may have been installed previously). ConfigureMe bootstrap ran fine. From the output I gather it includes the equivalent of the ConfigureMe configure run, and now I just need to do the make? right Doug On Thu, 2007-03-29 at 09:23 -0400, Doug Knight wrote: OK, Tried that, no luck. It still complains about libtool, autoconf, and automake. When I copy over the same basic files from the 2.0.8 directory, bootstrap still does not work, but ConfigureMe configure does (at least to the point where it starts looking for the Makefile.in files). Doug On Thu, 2007-03-29 at 09:50 +0200, Andrew Beekhof wrote: pretty sure you need: ./ConfigureMe bootstrap On 3/28/07, Doug Knight [EMAIL PROTECTED] wrote: Alan and Lars, After much effort, I have had no success in building from the mercurial version. Here's what I tried, and since this is the first time I've tried to build a dev version, maybe you can see where I'm going wrong: Downloaded tar from hg.linux-ha.org/dev Unpacked it in a subdirectory to my root account, HA Attempted a quick ConfigureMe configure, got errors that it couldn't find libtool, automake, autoconf Did a side-by-side comparison to the HA 2.0.8 I built successfully and am running with, found no libltdl.tar or libltdl directory under the dev version. Copied these from my original stable release tar into the directory structure for the dev version. Ran ConfigureMe configure, which then complained about all the Makefile.in files missing. Copied those over as well from the side-by-side. Also pulled include/ha_config.h.in and linux-ha/config.h.in since it complained about those missing too. ConfigureMe configure runs to completion ConfigureMe make exits with the following: In file included from base64.c:18: ../../include/heartbeat.h:38:23: error: hb_config.h: No such file or directory ../../include/heartbeat.h:98:1: error: HB_RC_DIR redefined In file included from ../../include/lha_internal.h:37, from base64.c:17: ../../linux-ha/config.h:504:1: error: this is the location of the previous definition In file included from base64.c:18: ../../include/heartbeat.h:102:1: error: HALIB redefined command line:1:1: error:
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
I've never done a bugzilla report, maybe you can point me in the right direction. In the mean time I'll continue to test with my modified version of crm_master that doesn't spin. I've managed to get the master side of the process configuration up, but am having trouble getting the slave side up. I did notice that when I select Stop from the GUI to completely stop the master, it demotes it first then stop it? The demote is what I use to reconfigure a master going down to become the slave after the new master comes up. Is there a way in demote to determine its been triggered by a Stop request vs a Demote? Doug On Tue, 2007-04-03 at 12:28 +0200, Andrew Beekhof wrote: Hi Doug, I just tried to reproduce this (loaded your cib and ran crm_master the same way you did) but it worked fine. Can you open a bug in bugzilla and include the result of cibadmin -Q please? On 4/2/07, Doug Knight [EMAIL PROTECTED] wrote: Andrew, Alan, Lars, et al, Any updates on the spinning crm_master? I and an associate of mine here are looking into compiler flags and settings to see if we can find anything there. If there is more info you need, just let me know. Doug On Fri, 2007-03-30 at 11:15 -0400, Doug Knight wrote: I've done some more looking at the cib_attrs.c module find_attr_details function, and it seems that the call to find_xml_children filtering by node eventually finds a match (match_found = 1 in xml.c/find_xml_children) terminating the recursive search. Later in the module, the call to find_xml_children filtering by name gets a match but continues to search. I never see the search by set: in the debug log, nor the printf I've placed in my version right after the call to find_xml_children. In fact, it seems that somehow find_xml_children loops within itself, though I must really be missing something there. Doug On Fri, 2007-03-30 at 08:20 -0400, Doug Knight wrote: Andrew, FYI, the one I ran has some debug statements in it that I put in there. Let me know if you want cleaner output... Doug On Fri, 2007-03-30 at 07:44 -0400, Doug Knight wrote: Hi Andrew, I'll get to this today. I've been digging through some of the source, putting in some additional logging, etc, just to see where the problem occurs. Thanks, Doug On Fri, 2007-03-30 at 10:32 +0200, Andrew Beekhof wrote: On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote: Results: Using the dev version or crm_master I still spin at the top of the CPU stack. What's next? probably time to log a bug... can you include the logs of the crm_master command when you add - (an insance amount of logging) to your normal command line and your current CIB please Doug On Thu, 2007-03-29 at 10:06 -0400, Doug Knight wrote: OK, ConfigureMe make is complete. As when I built the baseline 2.0.8, I had two compile errors, and had to make some minor source code mods (see attached error message and file diffs). Not sure if this is related or not. I've done a make install in the crm/admin directory to replace the crm_master used by the system. I'll test that out in a bit. Doug On Thu, 2007-03-29 at 15:38 +0200, Andrew Beekhof wrote: On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote: I went ahead and installed via RPM automake, autoconf, and libtool, even though they were not needed for the 2.0.8 baseline build (I believe libtool at least may have been installed previously). ConfigureMe bootstrap ran fine. From the output I gather it includes the equivalent of the ConfigureMe configure run, and now I just need to do the make? right Doug On Thu, 2007-03-29 at 09:23 -0400, Doug Knight wrote: OK, Tried that, no luck. It still complains about libtool, autoconf, and automake. When I copy over the same basic files from the 2.0.8 directory, bootstrap still does not work, but ConfigureMe configure does (at least to the point where it starts looking for the Makefile.in files). Doug On Thu, 2007-03-29 at 09:50 +0200, Andrew Beekhof wrote: pretty sure you need: ./ConfigureMe bootstrap On 3/28/07, Doug Knight [EMAIL PROTECTED] wrote: Alan and Lars, After much effort, I have had no success in building from the mercurial version. Here's what I tried, and since this is the first time I've tried to build a dev version, maybe you can see where I'm going wrong: Downloaded tar from hg.linux-ha.org/dev Unpacked it in a subdirectory to my root account, HA Attempted a quick ConfigureMe configure, got errors that it couldn't find libtool, automake, autoconf Did a side-by-side comparison to the HA 2.0.8 I built successfully and am running with, found no libltdl.tar or libltdl directory under the dev version. Copied these from my original
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
On 4/3/07, Doug Knight [EMAIL PROTECTED] wrote: I've never done a bugzilla report, maybe you can point me in the right direction. http://old.linux-foundation.org/developer_bugzilla/enter_bug.cgi In the mean time I'll continue to test with my modified version of crm_master that doesn't spin. my concerns are that a) it works for us as-is, and b) it now works for you but for unknown reasons. I've managed to get the master side of the process configuration up, but am having trouble getting the slave side up. I did notice that when I select Stop from the GUI to completely stop the master, it demotes it first then stop it? The demote is what I use to reconfigure a master going down to become the slave after the new master comes up. Is there a way in demote to determine its been triggered by a Stop request vs a Demote? if you're a master, you'll always get a demote before a stop. Doug On Tue, 2007-04-03 at 12:28 +0200, Andrew Beekhof wrote: Hi Doug, I just tried to reproduce this (loaded your cib and ran crm_master the same way you did) but it worked fine. Can you open a bug in bugzilla and include the result of cibadmin -Q please? On 4/2/07, Doug Knight [EMAIL PROTECTED] wrote: Andrew, Alan, Lars, et al, Any updates on the spinning crm_master? I and an associate of mine here are looking into compiler flags and settings to see if we can find anything there. If there is more info you need, just let me know. Doug On Fri, 2007-03-30 at 11:15 -0400, Doug Knight wrote: I've done some more looking at the cib_attrs.c module find_attr_details function, and it seems that the call to find_xml_children filtering by node eventually finds a match (match_found = 1 in xml.c/find_xml_children) terminating the recursive search. Later in the module, the call to find_xml_children filtering by name gets a match but continues to search. I never see the search by set: in the debug log, nor the printf I've placed in my version right after the call to find_xml_children. In fact, it seems that somehow find_xml_children loops within itself, though I must really be missing something there. Doug On Fri, 2007-03-30 at 08:20 -0400, Doug Knight wrote: Andrew, FYI, the one I ran has some debug statements in it that I put in there. Let me know if you want cleaner output... Doug On Fri, 2007-03-30 at 07:44 -0400, Doug Knight wrote: Hi Andrew, I'll get to this today. I've been digging through some of the source, putting in some additional logging, etc, just to see where the problem occurs. Thanks, Doug On Fri, 2007-03-30 at 10:32 +0200, Andrew Beekhof wrote: On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote: Results: Using the dev version or crm_master I still spin at the top of the CPU stack. What's next? probably time to log a bug... can you include the logs of the crm_master command when you add - (an insance amount of logging) to your normal command line and your current CIB please Doug On Thu, 2007-03-29 at 10:06 -0400, Doug Knight wrote: OK, ConfigureMe make is complete. As when I built the baseline 2.0.8, I had two compile errors, and had to make some minor source code mods (see attached error message and file diffs). Not sure if this is related or not. I've done a make install in the crm/admin directory to replace the crm_master used by the system. I'll test that out in a bit. Doug On Thu, 2007-03-29 at 15:38 +0200, Andrew Beekhof wrote: On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote: I went ahead and installed via RPM automake, autoconf, and libtool, even though they were not needed for the 2.0.8 baseline build (I believe libtool at least may have been installed previously). ConfigureMe bootstrap ran fine. From the output I gather it includes the equivalent of the ConfigureMe configure run, and now I just need to do the make? right Doug On Thu, 2007-03-29 at 09:23 -0400, Doug Knight wrote: OK, Tried that, no luck. It still complains about libtool, autoconf, and automake. When I copy over the same basic files from the 2.0.8 directory, bootstrap still does not work, but ConfigureMe configure does (at least to the point where it starts looking for the Makefile.in files). Doug On Thu, 2007-03-29 at 09:50 +0200, Andrew Beekhof wrote: pretty sure you need: ./ConfigureMe bootstrap On 3/28/07, Doug Knight [EMAIL PROTECTED] wrote: Alan and Lars, After much effort, I have had no success in building from the mercurial version. Here's what I tried, and since this is the first time I've tried to build a dev version, maybe you can see where I'm going wrong: Downloaded tar from hg.linux-ha.org/dev Unpacked it in a subdirectory to my root account, HA Attempted a quick ConfigureMe configure, got errors that it couldn't find libtool, automake, autoconf Did a side-by-side comparison to the HA 2.0.8 I
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Lars Marowsky-Bree wrote: On 2007-03-29T21:31:32, Alan Robertson [EMAIL PROTECTED] wrote: THe docs are more specific: They state that it is mandatory to call it in the start action. That doesn't seem to be true. That's a correct observation. You could call it in a monitor action quite nicely, or so it seems to me... Yes. I'd avoid needless calls to it though as every call triggers a transition - so don't just call it in _every_ monitor operation ;-) That's a good point. Unfortunately that means the RA has to keep state somewhere, which is a bit of a pain. -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Andrew, Alan, Lars, et al, Any updates on the spinning crm_master? I and an associate of mine here are looking into compiler flags and settings to see if we can find anything there. If there is more info you need, just let me know. Doug On Fri, 2007-03-30 at 11:15 -0400, Doug Knight wrote: I've done some more looking at the cib_attrs.c module find_attr_details function, and it seems that the call to find_xml_children filtering by node eventually finds a match (match_found = 1 in xml.c/find_xml_children) terminating the recursive search. Later in the module, the call to find_xml_children filtering by name gets a match but continues to search. I never see the search by set: in the debug log, nor the printf I've placed in my version right after the call to find_xml_children. In fact, it seems that somehow find_xml_children loops within itself, though I must really be missing something there. Doug On Fri, 2007-03-30 at 08:20 -0400, Doug Knight wrote: Andrew, FYI, the one I ran has some debug statements in it that I put in there. Let me know if you want cleaner output... Doug On Fri, 2007-03-30 at 07:44 -0400, Doug Knight wrote: Hi Andrew, I'll get to this today. I've been digging through some of the source, putting in some additional logging, etc, just to see where the problem occurs. Thanks, Doug On Fri, 2007-03-30 at 10:32 +0200, Andrew Beekhof wrote: On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote: Results: Using the dev version or crm_master I still spin at the top of the CPU stack. What's next? probably time to log a bug... can you include the logs of the crm_master command when you add - (an insance amount of logging) to your normal command line and your current CIB please Doug On Thu, 2007-03-29 at 10:06 -0400, Doug Knight wrote: OK, ConfigureMe make is complete. As when I built the baseline 2.0.8, I had two compile errors, and had to make some minor source code mods (see attached error message and file diffs). Not sure if this is related or not. I've done a make install in the crm/admin directory to replace the crm_master used by the system. I'll test that out in a bit. Doug On Thu, 2007-03-29 at 15:38 +0200, Andrew Beekhof wrote: On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote: I went ahead and installed via RPM automake, autoconf, and libtool, even though they were not needed for the 2.0.8 baseline build (I believe libtool at least may have been installed previously). ConfigureMe bootstrap ran fine. From the output I gather it includes the equivalent of the ConfigureMe configure run, and now I just need to do the make? right Doug On Thu, 2007-03-29 at 09:23 -0400, Doug Knight wrote: OK, Tried that, no luck. It still complains about libtool, autoconf, and automake. When I copy over the same basic files from the 2.0.8 directory, bootstrap still does not work, but ConfigureMe configure does (at least to the point where it starts looking for the Makefile.in files). Doug On Thu, 2007-03-29 at 09:50 +0200, Andrew Beekhof wrote: pretty sure you need: ./ConfigureMe bootstrap On 3/28/07, Doug Knight [EMAIL PROTECTED] wrote: Alan and Lars, After much effort, I have had no success in building from the mercurial version. Here's what I tried, and since this is the first time I've tried to build a dev version, maybe you can see where I'm going wrong: Downloaded tar from hg.linux-ha.org/dev Unpacked it in a subdirectory to my root account, HA Attempted a quick ConfigureMe configure, got errors that it couldn't find libtool, automake, autoconf Did a side-by-side comparison to the HA 2.0.8 I built successfully and am running with, found no libltdl.tar or libltdl directory under the dev version. Copied these from my original stable release tar into the directory structure for the dev version. Ran ConfigureMe configure, which then complained about all the Makefile.in files missing. Copied those over as well from the side-by-side. Also pulled include/ha_config.h.in and linux-ha/config.h.in since it complained about those missing too. ConfigureMe configure runs to completion ConfigureMe make exits with the following: In file included from base64.c:18: ../../include/heartbeat.h:38:23: error: hb_config.h: No such file or directory ../../include/heartbeat.h:98:1: error: HB_RC_DIR redefined In file included from
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Hi Andrew, I'll get to this today. I've been digging through some of the source, putting in some additional logging, etc, just to see where the problem occurs. Thanks, Doug On Fri, 2007-03-30 at 10:32 +0200, Andrew Beekhof wrote: On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote: Results: Using the dev version or crm_master I still spin at the top of the CPU stack. What's next? probably time to log a bug... can you include the logs of the crm_master command when you add - (an insance amount of logging) to your normal command line and your current CIB please Doug On Thu, 2007-03-29 at 10:06 -0400, Doug Knight wrote: OK, ConfigureMe make is complete. As when I built the baseline 2.0.8, I had two compile errors, and had to make some minor source code mods (see attached error message and file diffs). Not sure if this is related or not. I've done a make install in the crm/admin directory to replace the crm_master used by the system. I'll test that out in a bit. Doug On Thu, 2007-03-29 at 15:38 +0200, Andrew Beekhof wrote: On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote: I went ahead and installed via RPM automake, autoconf, and libtool, even though they were not needed for the 2.0.8 baseline build (I believe libtool at least may have been installed previously). ConfigureMe bootstrap ran fine. From the output I gather it includes the equivalent of the ConfigureMe configure run, and now I just need to do the make? right Doug On Thu, 2007-03-29 at 09:23 -0400, Doug Knight wrote: OK, Tried that, no luck. It still complains about libtool, autoconf, and automake. When I copy over the same basic files from the 2.0.8 directory, bootstrap still does not work, but ConfigureMe configure does (at least to the point where it starts looking for the Makefile.in files). Doug On Thu, 2007-03-29 at 09:50 +0200, Andrew Beekhof wrote: pretty sure you need: ./ConfigureMe bootstrap On 3/28/07, Doug Knight [EMAIL PROTECTED] wrote: Alan and Lars, After much effort, I have had no success in building from the mercurial version. Here's what I tried, and since this is the first time I've tried to build a dev version, maybe you can see where I'm going wrong: Downloaded tar from hg.linux-ha.org/dev Unpacked it in a subdirectory to my root account, HA Attempted a quick ConfigureMe configure, got errors that it couldn't find libtool, automake, autoconf Did a side-by-side comparison to the HA 2.0.8 I built successfully and am running with, found no libltdl.tar or libltdl directory under the dev version. Copied these from my original stable release tar into the directory structure for the dev version. Ran ConfigureMe configure, which then complained about all the Makefile.in files missing. Copied those over as well from the side-by-side. Also pulled include/ha_config.h.in and linux-ha/config.h.in since it complained about those missing too. ConfigureMe configure runs to completion ConfigureMe make exits with the following: In file included from base64.c:18: ../../include/heartbeat.h:38:23: error: hb_config.h: No such file or directory ../../include/heartbeat.h:98:1: error: HB_RC_DIR redefined In file included from ../../include/lha_internal.h:37, from base64.c:17: ../../linux-ha/config.h:504:1: error: this is the location of the previous definition In file included from base64.c:18: ../../include/heartbeat.h:102:1: error: HALIB redefined command line:1:1: error: this is the location of the previous definition gmake[2]: *** [base64.lo] Error 1 gmake[2]: Leaving directory `/root/HA/Heartbeat-Dev-829e377e00bd/lib/clplumbing' gmake[1]: *** [all-recursive] Error 1 gmake[1]: Leaving directory `/root/HA/Heartbeat-Dev-829e377e00bd/lib' gmake: *** [all-recursive] Error 1 Any ideas? Doug On Sun, 2007-03-25 at 16:06 -0600, Alan Robertson wrote: Doug Knight wrote: Got it. The attached file contains the strace from the second attempt by heartbeat to start the resource up as master, right up until it was killed. The resource already showed failed on the gui. I zipped it up using gzip. Lars asked a good question as well... Could you kindly reproduce this with the current Mercurial tip version? Thanks1 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
OK, let me try this again. My last email was too big by a little and got held up for review. I've bzip2'ed this one. Doug On Fri, 2007-03-30 at 08:20 -0400, Doug Knight wrote: Andrew, FYI, the one I ran has some debug statements in it that I put in there. Let me know if you want cleaner output... Doug On Fri, 2007-03-30 at 07:44 -0400, Doug Knight wrote: Hi Andrew, I'll get to this today. I've been digging through some of the source, putting in some additional logging, etc, just to see where the problem occurs. Thanks, Doug On Fri, 2007-03-30 at 10:32 +0200, Andrew Beekhof wrote: On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote: Results: Using the dev version or crm_master I still spin at the top of the CPU stack. What's next? probably time to log a bug... can you include the logs of the crm_master command when you add - (an insance amount of logging) to your normal command line and your current CIB please Doug On Thu, 2007-03-29 at 10:06 -0400, Doug Knight wrote: OK, ConfigureMe make is complete. As when I built the baseline 2.0.8, I had two compile errors, and had to make some minor source code mods (see attached error message and file diffs). Not sure if this is related or not. I've done a make install in the crm/admin directory to replace the crm_master used by the system. I'll test that out in a bit. Doug On Thu, 2007-03-29 at 15:38 +0200, Andrew Beekhof wrote: On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote: I went ahead and installed via RPM automake, autoconf, and libtool, even though they were not needed for the 2.0.8 baseline build (I believe libtool at least may have been installed previously). ConfigureMe bootstrap ran fine. From the output I gather it includes the equivalent of the ConfigureMe configure run, and now I just need to do the make? right Doug On Thu, 2007-03-29 at 09:23 -0400, Doug Knight wrote: OK, Tried that, no luck. It still complains about libtool, autoconf, and automake. When I copy over the same basic files from the 2.0.8 directory, bootstrap still does not work, but ConfigureMe configure does (at least to the point where it starts looking for the Makefile.in files). Doug On Thu, 2007-03-29 at 09:50 +0200, Andrew Beekhof wrote: pretty sure you need: ./ConfigureMe bootstrap On 3/28/07, Doug Knight [EMAIL PROTECTED] wrote: Alan and Lars, After much effort, I have had no success in building from the mercurial version. Here's what I tried, and since this is the first time I've tried to build a dev version, maybe you can see where I'm going wrong: Downloaded tar from hg.linux-ha.org/dev Unpacked it in a subdirectory to my root account, HA Attempted a quick ConfigureMe configure, got errors that it couldn't find libtool, automake, autoconf Did a side-by-side comparison to the HA 2.0.8 I built successfully and am running with, found no libltdl.tar or libltdl directory under the dev version. Copied these from my original stable release tar into the directory structure for the dev version. Ran ConfigureMe configure, which then complained about all the Makefile.in files missing. Copied those over as well from the side-by-side. Also pulled include/ha_config.h.in and linux-ha/config.h.in since it complained about those missing too. ConfigureMe configure runs to completion ConfigureMe make exits with the following: In file included from base64.c:18: ../../include/heartbeat.h:38:23: error: hb_config.h: No such file or directory ../../include/heartbeat.h:98:1: error: HB_RC_DIR redefined In file included from ../../include/lha_internal.h:37, from base64.c:17: ../../linux-ha/config.h:504:1: error: this is the location of the previous definition In file included from base64.c:18: ../../include/heartbeat.h:102:1: error: HALIB redefined command line:1:1: error: this is the location of the previous definition gmake[2]: *** [base64.lo] Error 1 gmake[2]: Leaving directory `/root/HA/Heartbeat-Dev-829e377e00bd/lib/clplumbing' gmake[1]: *** [all-recursive] Error 1 gmake[1]: Leaving directory `/root/HA/Heartbeat-Dev-829e377e00bd/lib' gmake: *** [all-recursive] Error 1 Any ideas? Doug On Sun, 2007-03-25 at 16:06 -0600, Alan Robertson wrote: Doug Knight wrote: Got it. The attached file contains the strace from the second attempt by heartbeat to start the resource up as master, right
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
On 3/28/07, Doug Knight [EMAIL PROTECTED] wrote: One additional question that has come up as I've developed my notify function: When the promote of the slave completes, and the post-promote notify is sent out, is this post-promote notify sent to both the master and slave nodes, or just to the slave? both Doug On Wed, 2007-03-28 at 12:34 -0400, Doug Knight wrote: Hi Lars, I've gone through your comments below, and I think I understand this a bit better. Let me state what I think I need to do, and see if I've got it now (starting with Node A master and Node B slave): Node A: PostgreSQL master gets demoted - completes (basically just stops PostgreSQL) Node B: PostgreSQL slave gets promoted - starts it up as new master server Node A: PostgreSQL demoted master gets a post notify call with promote notify operation At this point I know heartbeat has brought the new master server up (node B), and I can safely rsync and restart the new slave server (node A). Nice and clean... Also, I need to remember that notify will get called for all combinations, and if I'm only handling the post-promote combination, all others I need to simply return OCF_SUCCESS, right? The remaining question I have is, does adding notify to the actions meta-data and configuring the notify operation via the Add Operation on the GUI activate heartbeat's usage of notify? Or are there any other parameters/flags I need to set to enable using notify? I seem to remember there was mention of some configuration items for using notify. Thanks, Doug On Tue, 2007-03-27 at 23:14 +0200, Lars Marowsky-Bree wrote: On 2007-03-22T13:24:02, Doug Knight [EMAIL PROTECTED] wrote: Hi, I've been out a bit myself but now want to answer this. Hi Alan, I took a look at the drbd OCF script's notify function, and the online documentation. I believe there is one circumstance where I need to make use of the pre/post notify. The reason why drbd calls update_prefs (ie, crm_master) in the post(-start) notification, and not within start itself, is that by that time, start will have been completed on all (one or both) nodes. That means that by that time, it's safe to figure out which side is preferable for becoming master. The last step in my development/testing has to do with several steps I take to prepare the server that was primary and is now becoming standby. First, the primary gets demoted, right? Yes. Then the secondary gets promoted. The problem I have is that part of the process of preparing the new standby requires that the new active server process is up and accessible. If the demote has to complete before the promote can begin, I cannot do the rsync in the demote, because the promote hasn't started and placed the new primary in an accessible state. That seems to be true for your scenario, yes. So, if I understand the notify function, then I need a post process section that looks for the master going active and accessible, so I can do the rsync and start up the new standby, right? That you could do. The instances will get a post-promote notification, which could do what you want. Can you expand a little on the notify processing? The web page just lists the variables involved, and the drbd OCF script only makes use of a few of them, and I need a more detailed explanation of how and when they are used. Well, you get a pre-notification before start/stop/promote/demote happen anywhere and a post-notification after they have completed everywhere. That's basically the gist of it. Does that make it clearer, or do you have a specific question? Sincerely, Lars ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
I went ahead and installed via RPM automake, autoconf, and libtool, even though they were not needed for the 2.0.8 baseline build (I believe libtool at least may have been installed previously). ConfigureMe bootstrap ran fine. From the output I gather it includes the equivalent of the ConfigureMe configure run, and now I just need to do the make? Doug On Thu, 2007-03-29 at 09:23 -0400, Doug Knight wrote: OK, Tried that, no luck. It still complains about libtool, autoconf, and automake. When I copy over the same basic files from the 2.0.8 directory, bootstrap still does not work, but ConfigureMe configure does (at least to the point where it starts looking for the Makefile.in files). Doug On Thu, 2007-03-29 at 09:50 +0200, Andrew Beekhof wrote: pretty sure you need: ./ConfigureMe bootstrap On 3/28/07, Doug Knight [EMAIL PROTECTED] wrote: Alan and Lars, After much effort, I have had no success in building from the mercurial version. Here's what I tried, and since this is the first time I've tried to build a dev version, maybe you can see where I'm going wrong: Downloaded tar from hg.linux-ha.org/dev Unpacked it in a subdirectory to my root account, HA Attempted a quick ConfigureMe configure, got errors that it couldn't find libtool, automake, autoconf Did a side-by-side comparison to the HA 2.0.8 I built successfully and am running with, found no libltdl.tar or libltdl directory under the dev version. Copied these from my original stable release tar into the directory structure for the dev version. Ran ConfigureMe configure, which then complained about all the Makefile.in files missing. Copied those over as well from the side-by-side. Also pulled include/ha_config.h.in and linux-ha/config.h.in since it complained about those missing too. ConfigureMe configure runs to completion ConfigureMe make exits with the following: In file included from base64.c:18: ../../include/heartbeat.h:38:23: error: hb_config.h: No such file or directory ../../include/heartbeat.h:98:1: error: HB_RC_DIR redefined In file included from ../../include/lha_internal.h:37, from base64.c:17: ../../linux-ha/config.h:504:1: error: this is the location of the previous definition In file included from base64.c:18: ../../include/heartbeat.h:102:1: error: HALIB redefined command line:1:1: error: this is the location of the previous definition gmake[2]: *** [base64.lo] Error 1 gmake[2]: Leaving directory `/root/HA/Heartbeat-Dev-829e377e00bd/lib/clplumbing' gmake[1]: *** [all-recursive] Error 1 gmake[1]: Leaving directory `/root/HA/Heartbeat-Dev-829e377e00bd/lib' gmake: *** [all-recursive] Error 1 Any ideas? Doug On Sun, 2007-03-25 at 16:06 -0600, Alan Robertson wrote: Doug Knight wrote: Got it. The attached file contains the strace from the second attempt by heartbeat to start the resource up as master, right up until it was killed. The resource already showed failed on the gui. I zipped it up using gzip. Lars asked a good question as well... Could you kindly reproduce this with the current Mercurial tip version? Thanks1 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
OK, ConfigureMe make is complete. As when I built the baseline 2.0.8, I had two compile errors, and had to make some minor source code mods (see attached error message and file diffs). Not sure if this is related or not. I've done a make install in the crm/admin directory to replace the crm_master used by the system. I'll test that out in a bit. Doug On Thu, 2007-03-29 at 15:38 +0200, Andrew Beekhof wrote: On 3/29/07, Doug Knight [EMAIL PROTECTED] wrote: I went ahead and installed via RPM automake, autoconf, and libtool, even though they were not needed for the 2.0.8 baseline build (I believe libtool at least may have been installed previously). ConfigureMe bootstrap ran fine. From the output I gather it includes the equivalent of the ConfigureMe configure run, and now I just need to do the make? right Doug On Thu, 2007-03-29 at 09:23 -0400, Doug Knight wrote: OK, Tried that, no luck. It still complains about libtool, autoconf, and automake. When I copy over the same basic files from the 2.0.8 directory, bootstrap still does not work, but ConfigureMe configure does (at least to the point where it starts looking for the Makefile.in files). Doug On Thu, 2007-03-29 at 09:50 +0200, Andrew Beekhof wrote: pretty sure you need: ./ConfigureMe bootstrap On 3/28/07, Doug Knight [EMAIL PROTECTED] wrote: Alan and Lars, After much effort, I have had no success in building from the mercurial version. Here's what I tried, and since this is the first time I've tried to build a dev version, maybe you can see where I'm going wrong: Downloaded tar from hg.linux-ha.org/dev Unpacked it in a subdirectory to my root account, HA Attempted a quick ConfigureMe configure, got errors that it couldn't find libtool, automake, autoconf Did a side-by-side comparison to the HA 2.0.8 I built successfully and am running with, found no libltdl.tar or libltdl directory under the dev version. Copied these from my original stable release tar into the directory structure for the dev version. Ran ConfigureMe configure, which then complained about all the Makefile.in files missing. Copied those over as well from the side-by-side. Also pulled include/ha_config.h.in and linux-ha/config.h.in since it complained about those missing too. ConfigureMe configure runs to completion ConfigureMe make exits with the following: In file included from base64.c:18: ../../include/heartbeat.h:38:23: error: hb_config.h: No such file or directory ../../include/heartbeat.h:98:1: error: HB_RC_DIR redefined In file included from ../../include/lha_internal.h:37, from base64.c:17: ../../linux-ha/config.h:504:1: error: this is the location of the previous definition In file included from base64.c:18: ../../include/heartbeat.h:102:1: error: HALIB redefined command line:1:1: error: this is the location of the previous definition gmake[2]: *** [base64.lo] Error 1 gmake[2]: Leaving directory `/root/HA/Heartbeat-Dev-829e377e00bd/lib/clplumbing' gmake[1]: *** [all-recursive] Error 1 gmake[1]: Leaving directory `/root/HA/Heartbeat-Dev-829e377e00bd/lib' gmake: *** [all-recursive] Error 1 Any ideas? Doug On Sun, 2007-03-25 at 16:06 -0600, Alan Robertson wrote: Doug Knight wrote: Got it. The attached file contains the strace from the second attempt by heartbeat to start the resource up as master, right up until it was killed. The resource already showed failed on the gui. I zipped it up using gzip. Lars asked a good question as well... Could you kindly reproduce this with the current Mercurial tip version? Thanks1 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ send_arp.c: Making all in libnet_util gmake[2]: Entering directory `/root/HA/Heartbeat-Dev-829e377e00bd/heartbeat/libnet_util' if gcc -DHAVE_CONFIG_H -I. -I. -I../../include -I../../include -I../../include -I../../include
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Lars Marowsky-Bree wrote: On 2007-03-27T22:38:50, Alan Robertson [EMAIL PROTECTED] wrote: mandatory in the sense that nothing will get promoted until someone, somewhere runs it. but the exact timing is completely up to the user/admin/RA... it is even possible to run it manually if you have to I originally assumed what you said, but the docs contradict that by calling it mandatory (and not qualifying the term). Well, it's mandatory in the sense that without calling it, you don't get m/s, just a clone ;-) THe docs are more specific: They state that it is mandatory to call it in the start action. That doesn't seem to be true. You could call it in a monitor action quite nicely, or so it seems to me... -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
I'm working through the various emails I've received on what to try for the issue I'm having with crm_master. Here's my first response. I pulled down the bz2 version of the Mercurial version per the URL given in a previous email (hg.linux-ha.org/dev), and attempted to build on my server. Using the ./ConfigureMe configure command I get errors that I don't have autoconf, automake, and libtool, and some recursive symlinks appear for these in the same directory that I'm running ConfigureMe in. This is the same system I've built the 2.0.8 version on successfully. Do the dev versions require additional packages to be installed? Doug On Sun, 2007-03-25 at 16:06 -0600, Alan Robertson wrote: Doug Knight wrote: Got it. The attached file contains the strace from the second attempt by heartbeat to start the resource up as master, right up until it was killed. The resource already showed failed on the gui. I zipped it up using gzip. Lars asked a good question as well... Could you kindly reproduce this with the current Mercurial tip version? Thanks1 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
On Wed, 2007-03-28 at 09:14 +0200, Andrew Beekhof wrote: On 3/28/07, Alan Robertson [EMAIL PROTECTED] wrote: Andrew Beekhof wrote: On Mar 22, 2007, at 2:13 AM, Alan Robertson wrote: Doug Knight wrote: Hi Andrew, I had just started reviewing both of thes scripts, and reviewed the Multistate and clone resource pages on the web site. It looks like multistate is how I need to handle it, but a couple of questions first. 1. I noticed that the write-up says the resource must come up on each of the servers in shadow mode first, then one gets promoted. Does this imply a start on both servers, and the OCF start function determining which server is active vs shadow (I'm picturing a check in the OCF script to determine postgresql standby mode = shadow/crm_master value low, and postgresql active mode = active/crm_master value high), then a promote to the active server? 2. I noticed that the drbd OCF script contains a notify function, where the Stateful OCF script does not. The notify function looks to be where the important actions are taken (calling drbd_start_phase_2, pre/post, etc). Is the notify function necessary, or is it sufficient in my case to handle it through the start|stop|promote|demote functions? Thanks for your help, Doug Andrew's out for a while. The start function starts you up in slave/secondary mode. All resources initially start up in slave mode. A set of servers is chosen to run the resources on (it might be one, two, the whole set, etc. depending on clone_max and clone_node_max and the usual constraints). They are started on the selected nodes using start During the start operation, you are given the chance to declare yourself ready to become master or not by using the crm_master command line tool. I believe that your resource can run that command any time they like - for example at a monitor operation... But, it is mandatory that they run it when they first start up. mandatory in the sense that nothing will get promoted until someone, somewhere runs it. but the exact timing is completely up to the user/admin/RA... it is even possible to run it manually if you have to I originally assumed what you said, but the docs contradict that by calling it mandatory (and not qualifying the term). And the code seems to indicate that you can ONLY run it from an RA. if you know which OCF environment variables to set, then you can potentially run it from anywhere... but most people wont need to run it outside of the RA What I did was to set up the defaults within the OCF script to point to the locations and values I needed for this specific instance I'm testing. That way I can manually execute the script pretty well. That's when I ran into the crm_master spinning. I did need to manually set the OCF_RESOURCE_INSTANCE so that during the start function I could attempt to find if the resource already was running in the cluster somewhere. Definitely made it possible to test all the associated functions like stop, usage. methods, meta-data, monitor, etc. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Hi Lars, I've gone through your comments below, and I think I understand this a bit better. Let me state what I think I need to do, and see if I've got it now (starting with Node A master and Node B slave): Node A: PostgreSQL master gets demoted - completes (basically just stops PostgreSQL) Node B: PostgreSQL slave gets promoted - starts it up as new master server Node A: PostgreSQL demoted master gets a post notify call with promote notify operation At this point I know heartbeat has brought the new master server up (node B), and I can safely rsync and restart the new slave server (node A). Nice and clean... Also, I need to remember that notify will get called for all combinations, and if I'm only handling the post-promote combination, all others I need to simply return OCF_SUCCESS, right? The remaining question I have is, does adding notify to the actions meta-data and configuring the notify operation via the Add Operation on the GUI activate heartbeat's usage of notify? Or are there any other parameters/flags I need to set to enable using notify? I seem to remember there was mention of some configuration items for using notify. Thanks, Doug On Tue, 2007-03-27 at 23:14 +0200, Lars Marowsky-Bree wrote: On 2007-03-22T13:24:02, Doug Knight [EMAIL PROTECTED] wrote: Hi, I've been out a bit myself but now want to answer this. Hi Alan, I took a look at the drbd OCF script's notify function, and the online documentation. I believe there is one circumstance where I need to make use of the pre/post notify. The reason why drbd calls update_prefs (ie, crm_master) in the post(-start) notification, and not within start itself, is that by that time, start will have been completed on all (one or both) nodes. That means that by that time, it's safe to figure out which side is preferable for becoming master. The last step in my development/testing has to do with several steps I take to prepare the server that was primary and is now becoming standby. First, the primary gets demoted, right? Yes. Then the secondary gets promoted. The problem I have is that part of the process of preparing the new standby requires that the new active server process is up and accessible. If the demote has to complete before the promote can begin, I cannot do the rsync in the demote, because the promote hasn't started and placed the new primary in an accessible state. That seems to be true for your scenario, yes. So, if I understand the notify function, then I need a post process section that looks for the master going active and accessible, so I can do the rsync and start up the new standby, right? That you could do. The instances will get a post-promote notification, which could do what you want. Can you expand a little on the notify processing? The web page just lists the variables involved, and the drbd OCF script only makes use of a few of them, and I need a more detailed explanation of how and when they are used. Well, you get a pre-notification before start/stop/promote/demote happen anywhere and a post-notification after they have completed everywhere. That's basically the gist of it. Does that make it clearer, or do you have a specific question? Sincerely, Lars ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
One additional question that has come up as I've developed my notify function: When the promote of the slave completes, and the post-promote notify is sent out, is this post-promote notify sent to both the master and slave nodes, or just to the slave? Doug On Wed, 2007-03-28 at 12:34 -0400, Doug Knight wrote: Hi Lars, I've gone through your comments below, and I think I understand this a bit better. Let me state what I think I need to do, and see if I've got it now (starting with Node A master and Node B slave): Node A: PostgreSQL master gets demoted - completes (basically just stops PostgreSQL) Node B: PostgreSQL slave gets promoted - starts it up as new master server Node A: PostgreSQL demoted master gets a post notify call with promote notify operation At this point I know heartbeat has brought the new master server up (node B), and I can safely rsync and restart the new slave server (node A). Nice and clean... Also, I need to remember that notify will get called for all combinations, and if I'm only handling the post-promote combination, all others I need to simply return OCF_SUCCESS, right? The remaining question I have is, does adding notify to the actions meta-data and configuring the notify operation via the Add Operation on the GUI activate heartbeat's usage of notify? Or are there any other parameters/flags I need to set to enable using notify? I seem to remember there was mention of some configuration items for using notify. Thanks, Doug On Tue, 2007-03-27 at 23:14 +0200, Lars Marowsky-Bree wrote: On 2007-03-22T13:24:02, Doug Knight [EMAIL PROTECTED] wrote: Hi, I've been out a bit myself but now want to answer this. Hi Alan, I took a look at the drbd OCF script's notify function, and the online documentation. I believe there is one circumstance where I need to make use of the pre/post notify. The reason why drbd calls update_prefs (ie, crm_master) in the post(-start) notification, and not within start itself, is that by that time, start will have been completed on all (one or both) nodes. That means that by that time, it's safe to figure out which side is preferable for becoming master. The last step in my development/testing has to do with several steps I take to prepare the server that was primary and is now becoming standby. First, the primary gets demoted, right? Yes. Then the secondary gets promoted. The problem I have is that part of the process of preparing the new standby requires that the new active server process is up and accessible. If the demote has to complete before the promote can begin, I cannot do the rsync in the demote, because the promote hasn't started and placed the new primary in an accessible state. That seems to be true for your scenario, yes. So, if I understand the notify function, then I need a post process section that looks for the master going active and accessible, so I can do the rsync and start up the new standby, right? That you could do. The instances will get a post-promote notification, which could do what you want. Can you expand a little on the notify processing? The web page just lists the variables involved, and the drbd OCF script only makes use of a few of them, and I need a more detailed explanation of how and when they are used. Well, you get a pre-notification before start/stop/promote/demote happen anywhere and a post-notification after they have completed everywhere. That's basically the gist of it. Does that make it clearer, or do you have a specific question? Sincerely, Lars ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Andrew Beekhof wrote: On Mar 22, 2007, at 2:13 AM, Alan Robertson wrote: Doug Knight wrote: Hi Andrew, I had just started reviewing both of thes scripts, and reviewed the Multistate and clone resource pages on the web site. It looks like multistate is how I need to handle it, but a couple of questions first. 1. I noticed that the write-up says the resource must come up on each of the servers in shadow mode first, then one gets promoted. Does this imply a start on both servers, and the OCF start function determining which server is active vs shadow (I'm picturing a check in the OCF script to determine postgresql standby mode = shadow/crm_master value low, and postgresql active mode = active/crm_master value high), then a promote to the active server? 2. I noticed that the drbd OCF script contains a notify function, where the Stateful OCF script does not. The notify function looks to be where the important actions are taken (calling drbd_start_phase_2, pre/post, etc). Is the notify function necessary, or is it sufficient in my case to handle it through the start|stop|promote|demote functions? Thanks for your help, Doug Andrew's out for a while. The start function starts you up in slave/secondary mode. All resources initially start up in slave mode. A set of servers is chosen to run the resources on (it might be one, two, the whole set, etc. depending on clone_max and clone_node_max and the usual constraints). They are started on the selected nodes using start During the start operation, you are given the chance to declare yourself ready to become master or not by using the crm_master command line tool. I believe that your resource can run that command any time they like - for example at a monitor operation... But, it is mandatory that they run it when they first start up. mandatory in the sense that nothing will get promoted until someone, somewhere runs it. but the exact timing is completely up to the user/admin/RA... it is even possible to run it manually if you have to I originally assumed what you said, but the docs contradict that by calling it mandatory (and not qualifying the term). And the code seems to indicate that you can ONLY run it from an RA. -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Doug Knight wrote: Got it. The attached file contains the strace from the second attempt by heartbeat to start the resource up as master, right up until it was killed. The resource already showed failed on the gui. I zipped it up using gzip. Lars asked a good question as well... Could you kindly reproduce this with the current Mercurial tip version? Thanks1 -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Doug Knight wrote: Got it. The attached file contains the strace from the second attempt by heartbeat to start the resource up as master, right up until it was killed. The resource already showed failed on the gui. I zipped it up using gzip. By the way, from the system call perspective, what it's doing is mallocing again and again and again... I presume it's in this function (from the top level) rc = update_attr(the_cib, cib_opts, type, dest_node, set_name, attr_id, attr_name, attr_value); And I further presume (with somewhat more risk) that it's in this function from the next level down: rc = the_cib-cmds-modify(the_cib, section, xml_top, NULL, call_options|cib_quorum_override); cib_client_modify(CIB_OP_MODIFY...) cib_native_perform_op() Which sends the request over to the CIB, where it should do this... cib_process_modify() update_xml_child(obj_root, input) However, from cib_process_modify on, all the work takes place in the CIB, not in the crm_master command. So, I presume that it doesn't get that far. [Other theories are also possible, of course ;-)] Here is my initial conclusion: 1) No one else has reported this problem 2) The code in question is common and is used for many things 3) Therefore it's more likely that something is amiss with your CIB and causing the CIB code to loop looking for the subtree to modify. If this theory is correct, there are two problems one with your CIB, and one in the code. So, could you please send the current output from cibadmin -Q to the list as an attachment? Could you also please run crm_verify on your CIB and see if it complains about anything. If it does, please fix its complaints, and try again. And, could you also please tell us how you installed the system. If you didn't install a package, then did you make the required user ID and group ID? Thanks! -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
On 2007-03-23T12:56:12, Doug Knight [EMAIL PROTECTED] wrote: I figured this one out, please ignore, its because I didn't give it a value. If I run crm_master -v 100 at the command line, it spins right up to 100% cpu with no error. Can you retest that with the hg.linux-ha.org/dev version please? We've made significant improvements, and as Andrew is currently quite occupied, I'd hate to look for a fixed bug ;-) If you check out the drbd RA, you'll find that this one actually uses the crm_master command and does not have this problem; so it must be either something particular to your setup or something already fixed ... Sincerely, Lars -- Teamlead Kernel, SuSE Labs, Research and Development SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Hi Alan, I've started testing my OCF script, and I'm seeing something unusual during initial startup. I've placed a crm_master call in my stateful_start function, after the function has determined that it is running on what should be the master, and postgresql has successfully started: crm_master -v 100 When this command gets executed, it starts using nearly 100% CPU, memory usage continuously increases up to about 68%, then it dies (killed via timeout?), followed by a second attempt to go master (with the same charactistics, after the function timeout is exceeded), then a demote is sent (again, after timeout) and it switches to try to become the slave (crm_master -v 10 is what I use, though I'm not sure this is correct usage to say I want to change to a slave). Eventually, I wind up with the resource in failed mode. First question, any idea why the straight line running of a crm_master -v 100 (not within any loops in my script) would spin up to 100%? Second question, is using the crm_master -v with different values the way to say on which node I prefer the master to run (higher number = preferred node)? Doug On Thu, 2007-03-22 at 07:46 -0400, Doug Knight wrote: Thank you Alan, that explanation really helped. Would it be useful for me to post my OCF script once its done and tested? Doug On Wed, 2007-03-21 at 19:13 -0600, Alan Robertson wrote: Doug Knight wrote: Hi Andrew, I had just started reviewing both of thes scripts, and reviewed the Multistate and clone resource pages on the web site. It looks like multistate is how I need to handle it, but a couple of questions first. 1. I noticed that the write-up says the resource must come up on each of the servers in shadow mode first, then one gets promoted. Does this imply a start on both servers, and the OCF start function determining which server is active vs shadow (I'm picturing a check in the OCF script to determine postgresql standby mode = shadow/crm_master value low, and postgresql active mode = active/crm_master value high), then a promote to the active server? 2. I noticed that the drbd OCF script contains a notify function, where the Stateful OCF script does not. The notify function looks to be where the important actions are taken (calling drbd_start_phase_2, pre/post, etc). Is the notify function necessary, or is it sufficient in my case to handle it through the start|stop|promote|demote functions? Thanks for your help, Doug Andrew's out for a while. The start function starts you up in slave/secondary mode. All resources initially start up in slave mode. A set of servers is chosen to run the resources on (it might be one, two, the whole set, etc. depending on clone_max and clone_node_max and the usual constraints). They are started on the selected nodes using start During the start operation, you are given the chance to declare yourself ready to become master or not by using the crm_master command line tool. I believe that your resource can run that command any time they like - for example at a monitor operation... But, it is mandatory that they run it when they first start up. After this, heartbeat will try and promote as many of these resources as is consistent with its configured properties, and the crm_master commands that were run. The notify command tells you when your peers come and go. Do you need to take actions if you know this? If so, then you need to implement the notify actions... ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Doug Knight wrote: Hi Alan, I've started testing my OCF script, and I'm seeing something unusual during initial startup. I've placed a crm_master call in my stateful_start function, after the function has determined that it is running on what should be the master, and postgresql has successfully started: crm_master -v 100 When this command gets executed, it starts using nearly 100% CPU, memory usage continuously increases up to about 68%, then it dies (killed via timeout?), followed by a second attempt to go master (with the same charactistics, after the function timeout is exceeded), then a demote is sent (again, after timeout) and it switches to try to become the slave (crm_master -v 10 is what I use, though I'm not sure this is correct usage to say I want to change to a slave). Eventually, I wind up with the resource in failed mode. First question, any idea why the straight line running of a crm_master -v 100 (not within any loops in my script) would spin up to 100%? Bugs maybe? What version of heartbeat are you running? Which processes are running up to 100%? For how long? Second question, is using the crm_master -v with different values the way to say on which node I prefer the master to run (higher number = preferred node)? Yes. I believe that these are added into the values that come from other constraints in your configuration file to come up with a best configuration. -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Doug Knight wrote: Current 2.0.8 tarball from 1/18/07. Process in top looks like: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24591 root 18 0 1663m 1.5g 1028 R 83 77.8 1:19.42 /usr/sbin/crm_master -v 100 It dies and restarts about every 120 seconds, which happens to be the timeout I have specified for the stop and start methods. Doug On Fri, 2007-03-23 at 08:20 -0600, Alan Robertson wrote: Doug Knight wrote: Hi Alan, I've started testing my OCF script, and I'm seeing something unusual during initial startup. I've placed a crm_master call in my stateful_start function, after the function has determined that it is running on what should be the master, and postgresql has successfully started: crm_master -v 100 When this command gets executed, it starts using nearly 100% CPU, memory usage continuously increases up to about 68%, then it dies (killed via timeout?), followed by a second attempt to go master (with the same charactistics, after the function timeout is exceeded), then a demote is sent (again, after timeout) and it switches to try to become the slave (crm_master -v 10 is what I use, though I'm not sure this is correct usage to say I want to change to a slave). Eventually, I wind up with the resource in failed mode. First question, any idea why the straight line running of a crm_master -v 100 (not within any loops in my script) would spin up to 100%? Bugs maybe? What version of heartbeat are you running? Which processes are running up to 100%? For how long? Second question, is using the crm_master -v with different values the way to say on which node I prefer the master to run (higher number = preferred node)? Yes. I believe that these are added into the values that come from other constraints in your configuration file to come up with a best configuration. Good info. Could you provide a few hundred lines of strace output to show us what it's doing? Thanks! -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
On Fri, 2007-03-23 at 09:25 -0600, Alan Robertson wrote: Doug Knight wrote: Current 2.0.8 tarball from 1/18/07. Process in top looks like: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24591 root 18 0 1663m 1.5g 1028 R 83 77.8 1:19.42 /usr/sbin/crm_master -v 100 It dies and restarts about every 120 seconds, which happens to be the timeout I have specified for the stop and start methods. Doug On Fri, 2007-03-23 at 08:20 -0600, Alan Robertson wrote: Doug Knight wrote: Hi Alan, I've started testing my OCF script, and I'm seeing something unusual during initial startup. I've placed a crm_master call in my stateful_start function, after the function has determined that it is running on what should be the master, and postgresql has successfully started: crm_master -v 100 When this command gets executed, it starts using nearly 100% CPU, memory usage continuously increases up to about 68%, then it dies (killed via timeout?), followed by a second attempt to go master (with the same charactistics, after the function timeout is exceeded), then a demote is sent (again, after timeout) and it switches to try to become the slave (crm_master -v 10 is what I use, though I'm not sure this is correct usage to say I want to change to a slave). Eventually, I wind up with the resource in failed mode. First question, any idea why the straight line running of a crm_master -v 100 (not within any loops in my script) would spin up to 100%? Bugs maybe? What version of heartbeat are you running? Which processes are running up to 100%? For how long? Second question, is using the crm_master -v with different values the way to say on which node I prefer the master to run (higher number = preferred node)? Yes. I believe that these are added into the values that come from other constraints in your configuration file to come up with a best configuration. Good info. Could you provide a few hundred lines of strace output to show us what it's doing? Do you mean the last few hundred lines from ha.log? Just the primary where I'm trying to start? Thanks! ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
I figured this one out, please ignore, its because I didn't give it a value. If I run crm_master -v 100 at the command line, it spins right up to 100% cpu with no error. Doug On Fri, 2007-03-23 at 12:52 -0400, Doug Knight wrote: This might help. With the resource in a failed mode, but target_role = started, I manually ran crm_master, exporting the proper resource ID, with the following results: [EMAIL PROTECTED] wsi]# vi stateful_pgsql [EMAIL PROTECTED] wsi]# OCF_RESOURCE_INSTANCE=pgsql_wal_5556:0 [EMAIL PROTECTED] wsi]# export OCF_RESOURCE_INSTANCE [EMAIL PROTECTED] wsi]# crm_master -V crm_master[7588]: 2007/03/23_12:50:45 ERROR: crm_abort: main: Triggered non-fatal assert at crm_attribute.c:353 : attr_value != NULL Doug On Fri, 2007-03-23 at 12:21 -0400, Doug Knight wrote: Got it. The attached file contains the strace from the second attempt by heartbeat to start the resource up as master, right up until it was killed. The resource already showed failed on the gui. I zipped it up using gzip. Doug On Fri, 2007-03-23 at 10:11 -0600, Alan Robertson wrote: Doug Knight wrote: On Fri, 2007-03-23 at 09:25 -0600, Alan Robertson wrote: Doug Knight wrote: Current 2.0.8 tarball from 1/18/07. Process in top looks like: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24591 root 18 0 1663m 1.5g 1028 R 83 77.8 1:19.42 /usr/sbin/crm_master -v 100 It dies and restarts about every 120 seconds, which happens to be the timeout I have specified for the stop and start methods. Doug On Fri, 2007-03-23 at 08:20 -0600, Alan Robertson wrote: Doug Knight wrote: Hi Alan, I've started testing my OCF script, and I'm seeing something unusual during initial startup. I've placed a crm_master call in my stateful_start function, after the function has determined that it is running on what should be the master, and postgresql has successfully started: crm_master -v 100 When this command gets executed, it starts using nearly 100% CPU, memory usage continuously increases up to about 68%, then it dies (killed via timeout?), followed by a second attempt to go master (with the same charactistics, after the function timeout is exceeded), then a demote is sent (again, after timeout) and it switches to try to become the slave (crm_master -v 10 is what I use, though I'm not sure this is correct usage to say I want to change to a slave). Eventually, I wind up with the resource in failed mode. First question, any idea why the straight line running of a crm_master -v 100 (not within any loops in my script) would spin up to 100%? Bugs maybe? What version of heartbeat are you running? Which processes are running up to 100%? For how long? Second question, is using the crm_master -v with different values the way to say on which node I prefer the master to run (higher number = preferred node)? Yes. I believe that these are added into the values that come from other constraints in your configuration file to come up with a best configuration. Good info. Could you provide a few hundred lines of strace output to show us what it's doing? Do you mean the last few hundred lines from ha.log? Just the primary where I'm trying to start? No, I mean output from the strace command. From your reply, I'd guess you've never used it: strace -tt -p process-id-of-hung-process /some/file Do that for a few seconds, and attach the file to an email to the list. Does that help? ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Hi Alan, I took a look at the drbd OCF script's notify function, and the online documentation. I believe there is one circumstance where I need to make use of the pre/post notify. The last step in my development/testing has to do with several steps I take to prepare the server that was primary and is now becoming standby. First, the primary gets demoted, right? Then the secondary gets promoted. The problem I have is that part of the process of preparing the new standby requires that the new active server process is up and accessible. If the demote has to complete before the promote can begin, I cannot do the rsync in the demote, because the promote hasn't started and placed the new primary in an accessible state. So, if I understand the notify function, then I need a post process section that looks for the master going active and accessible, so I can do the rsync and start up the new standby, right? Can you expand a little on the notify processing? The web page just lists the variables involved, and the drbd OCF script only makes use of a few of them, and I need a more detailed explanation of how and when they are used. Thanks, Doug On Wed, 2007-03-21 at 19:13 -0600, Alan Robertson wrote: Doug Knight wrote: Hi Andrew, I had just started reviewing both of thes scripts, and reviewed the Multistate and clone resource pages on the web site. It looks like multistate is how I need to handle it, but a couple of questions first. 1. I noticed that the write-up says the resource must come up on each of the servers in shadow mode first, then one gets promoted. Does this imply a start on both servers, and the OCF start function determining which server is active vs shadow (I'm picturing a check in the OCF script to determine postgresql standby mode = shadow/crm_master value low, and postgresql active mode = active/crm_master value high), then a promote to the active server? 2. I noticed that the drbd OCF script contains a notify function, where the Stateful OCF script does not. The notify function looks to be where the important actions are taken (calling drbd_start_phase_2, pre/post, etc). Is the notify function necessary, or is it sufficient in my case to handle it through the start|stop|promote|demote functions? Thanks for your help, Doug Andrew's out for a while. The start function starts you up in slave/secondary mode. All resources initially start up in slave mode. A set of servers is chosen to run the resources on (it might be one, two, the whole set, etc. depending on clone_max and clone_node_max and the usual constraints). They are started on the selected nodes using start During the start operation, you are given the chance to declare yourself ready to become master or not by using the crm_master command line tool. I believe that your resource can run that command any time they like - for example at a monitor operation... But, it is mandatory that they run it when they first start up. After this, heartbeat will try and promote as many of these resources as is consistent with its configured properties, and the crm_master commands that were run. The notify command tells you when your peers come and go. Do you need to take actions if you know this? If so, then you need to implement the notify actions... ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
On 3/20/07, Doug Knight [EMAIL PROTECTED] wrote: Hi Alan, I've had some time to try to implement the OCF script for the PostgreSQL WAL file forwarding configuration, and I still am having some issues. The main issue is how do I set up an OCF script that allows the process to run on both servers, one process in a starting mode ingesting transferred WAL files, the other in an accessible mode allowing database access and forwarding WAL files to the standby (right now the different states are determined within the OCF script by looking for the existence of a recovery.conf file, which indicates the starting, standby side)? What I'm trying to figure out is how to tell HA that it needs to start the standby process and monitor it. I've started looking at the multi-state configurations, but the documentation isn't completely clear. Can you expand on it? I'd like something similar to the answers you gave for the monitor, stop, and start functions (when do they/can they get invoked, on which node, etc). One specific area of the multi-state config I have a question about is how to tell HA which server has the master role when first starting up the resource. Any additional info would be great. have a look at the drbd and Stateful agents: http://hg.beekhof.net/lha/crm-dev/file/be220f2e9b40/resources/OCF/Stateful.in but first read: http://linux-ha.org/v2/Concepts/MultiState in particular, section 6.1 Thanks, Doug On Wed, 2007-03-14 at 19:59 -0600, Alan Robertson wrote: Doug Knight wrote: Yes, Thanks Alan. Let me digest it, and walk through my OCF script. I'll see if I have any other questions after that. Thanks for getting back to me. Doug On Wed, 2007-03-14 at 11:41 -0600, Alan Robertson wrote: Doug Knight wrote: Hi All, I currently am running a two node cluster (host1 and host2) with version 2.0.8. I have a resource defined with a place constraint of #uname eq host1, so that it will start on host1 (using an OCF RA script, including all of the required methods). The resource itself has target_role set to stopped. Question 1: Is the monitor method called regularly on both nodes to make sure the resource is not running? No. It is called on every node when we first start up (we call it a probe operation). If you ask us to, we will run it periodically to ensure that a running copy continues to run. You can also manually request to run this initial probe again to catch errors made by system administrators (but I don't know of anyone who does that). Next, I change the target_role to started (i.e. I use the GUI and click the start button). [Better yet, use the outline start button] Question 2: What is the order of OCF methods called to bring up the resource? Is Monitor called before Start on host1? Does Stop and/or Monitor ever get called on host2? Monitor gets called when we first start up on every node. It also get called repeatedly on any node that we think is running it -- if you ask us to monitor the resource. Resource is up and running on host1, and I decide to move the resource to host2. I click the constraint and change it to #uname eq host2. The resource stops on host1 and starts on host2. Question 3: Same idea, what are the sequence of method calls to migrate the resource from host1 to host2? In the past... monitor every resource on every node once to see what's already running start on some node monitor periodically on some node (if requested) Request to move resource arrives... stop monitoring on some node (if it had been requested) stop resource on some node start resource on some other node monitor periodically on some other node (if requested) I'm trying to thoroughly understand the sequence of events that occurs for each phase in support of the Postgres WAL file forwarding configuration I posted last week ([Linux-HA] Two node cluster with Postgres in WAL file fwding mode, started March 1). Let me offer this caveat: Monitor might be called at any time. Stop should work at any time, and succeed harmlessly if it's already stopped. Start should work at any time, and succeed harmlessly if it's already started. http://www.linux-ha.org/OCFResourceAgent gives more details. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Hi Andrew, I had just started reviewing both of thes scripts, and reviewed the Multistate and clone resource pages on the web site. It looks like multistate is how I need to handle it, but a couple of questions first. 1. I noticed that the write-up says the resource must come up on each of the servers in shadow mode first, then one gets promoted. Does this imply a start on both servers, and the OCF start function determining which server is active vs shadow (I'm picturing a check in the OCF script to determine postgresql standby mode = shadow/crm_master value low, and postgresql active mode = active/crm_master value high), then a promote to the active server? 2. I noticed that the drbd OCF script contains a notify function, where the Stateful OCF script does not. The notify function looks to be where the important actions are taken (calling drbd_start_phase_2, pre/post, etc). Is the notify function necessary, or is it sufficient in my case to handle it through the start|stop|promote|demote functions? Thanks for your help, Doug On Wed, 2007-03-21 at 10:09 +0100, Andrew Beekhof wrote: On 3/20/07, Doug Knight [EMAIL PROTECTED] wrote: Hi Alan, I've had some time to try to implement the OCF script for the PostgreSQL WAL file forwarding configuration, and I still am having some issues. The main issue is how do I set up an OCF script that allows the process to run on both servers, one process in a starting mode ingesting transferred WAL files, the other in an accessible mode allowing database access and forwarding WAL files to the standby (right now the different states are determined within the OCF script by looking for the existence of a recovery.conf file, which indicates the starting, standby side)? What I'm trying to figure out is how to tell HA that it needs to start the standby process and monitor it. I've started looking at the multi-state configurations, but the documentation isn't completely clear. Can you expand on it? I'd like something similar to the answers you gave for the monitor, stop, and start functions (when do they/can they get invoked, on which node, etc). One specific area of the multi-state config I have a question about is how to tell HA which server has the master role when first starting up the resource. Any additional info would be great. have a look at the drbd and Stateful agents: http://hg.beekhof.net/lha/crm-dev/file/be220f2e9b40/resources/OCF/Stateful.in but first read: http://linux-ha.org/v2/Concepts/MultiState in particular, section 6.1 Thanks, Doug On Wed, 2007-03-14 at 19:59 -0600, Alan Robertson wrote: Doug Knight wrote: Yes, Thanks Alan. Let me digest it, and walk through my OCF script. I'll see if I have any other questions after that. Thanks for getting back to me. Doug On Wed, 2007-03-14 at 11:41 -0600, Alan Robertson wrote: Doug Knight wrote: Hi All, I currently am running a two node cluster (host1 and host2) with version 2.0.8. I have a resource defined with a place constraint of #uname eq host1, so that it will start on host1 (using an OCF RA script, including all of the required methods). The resource itself has target_role set to stopped. Question 1: Is the monitor method called regularly on both nodes to make sure the resource is not running? No. It is called on every node when we first start up (we call it a probe operation). If you ask us to, we will run it periodically to ensure that a running copy continues to run. You can also manually request to run this initial probe again to catch errors made by system administrators (but I don't know of anyone who does that). Next, I change the target_role to started (i.e. I use the GUI and click the start button). [Better yet, use the outline start button] Question 2: What is the order of OCF methods called to bring up the resource? Is Monitor called before Start on host1? Does Stop and/or Monitor ever get called on host2? Monitor gets called when we first start up on every node. It also get called repeatedly on any node that we think is running it -- if you ask us to monitor the resource. Resource is up and running on host1, and I decide to move the resource to host2. I click the constraint and change it to #uname eq host2. The resource stops on host1 and starts on host2. Question 3: Same idea, what are the sequence of method calls to migrate the resource from host1 to host2? In the past... monitor every resource on every node once to see what's already running start on some node monitor periodically on some node (if requested) Request to move resource arrives... stop monitoring on some node (if it had been requested) stop resource on some node start resource on some other node monitor periodically on some other node (if requested)
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Doug Knight wrote: Hi Andrew, I had just started reviewing both of thes scripts, and reviewed the Multistate and clone resource pages on the web site. It looks like multistate is how I need to handle it, but a couple of questions first. 1. I noticed that the write-up says the resource must come up on each of the servers in shadow mode first, then one gets promoted. Does this imply a start on both servers, and the OCF start function determining which server is active vs shadow (I'm picturing a check in the OCF script to determine postgresql standby mode = shadow/crm_master value low, and postgresql active mode = active/crm_master value high), then a promote to the active server? 2. I noticed that the drbd OCF script contains a notify function, where the Stateful OCF script does not. The notify function looks to be where the important actions are taken (calling drbd_start_phase_2, pre/post, etc). Is the notify function necessary, or is it sufficient in my case to handle it through the start|stop|promote|demote functions? Thanks for your help, Doug Andrew's out for a while. The start function starts you up in slave/secondary mode. All resources initially start up in slave mode. A set of servers is chosen to run the resources on (it might be one, two, the whole set, etc. depending on clone_max and clone_node_max and the usual constraints). They are started on the selected nodes using start During the start operation, you are given the chance to declare yourself ready to become master or not by using the crm_master command line tool. I believe that your resource can run that command any time they like - for example at a monitor operation... But, it is mandatory that they run it when they first start up. After this, heartbeat will try and promote as many of these resources as is consistent with its configured properties, and the crm_master commands that were run. The notify command tells you when your peers come and go. Do you need to take actions if you know this? If so, then you need to implement the notify actions... -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Hi Alan, I've had some time to try to implement the OCF script for the PostgreSQL WAL file forwarding configuration, and I still am having some issues. The main issue is how do I set up an OCF script that allows the process to run on both servers, one process in a starting mode ingesting transferred WAL files, the other in an accessible mode allowing database access and forwarding WAL files to the standby (right now the different states are determined within the OCF script by looking for the existence of a recovery.conf file, which indicates the starting, standby side)? What I'm trying to figure out is how to tell HA that it needs to start the standby process and monitor it. I've started looking at the multi-state configurations, but the documentation isn't completely clear. Can you expand on it? I'd like something similar to the answers you gave for the monitor, stop, and start functions (when do they/can they get invoked, on which node, etc). One specific area of the multi-state config I have a question about is how to tell HA which server has the master role when first starting up the resource. Any additional info would be great. Thanks, Doug On Wed, 2007-03-14 at 19:59 -0600, Alan Robertson wrote: Doug Knight wrote: Yes, Thanks Alan. Let me digest it, and walk through my OCF script. I'll see if I have any other questions after that. Thanks for getting back to me. Doug On Wed, 2007-03-14 at 11:41 -0600, Alan Robertson wrote: Doug Knight wrote: Hi All, I currently am running a two node cluster (host1 and host2) with version 2.0.8. I have a resource defined with a place constraint of #uname eq host1, so that it will start on host1 (using an OCF RA script, including all of the required methods). The resource itself has target_role set to stopped. Question 1: Is the monitor method called regularly on both nodes to make sure the resource is not running? No. It is called on every node when we first start up (we call it a probe operation). If you ask us to, we will run it periodically to ensure that a running copy continues to run. You can also manually request to run this initial probe again to catch errors made by system administrators (but I don't know of anyone who does that). Next, I change the target_role to started (i.e. I use the GUI and click the start button). [Better yet, use the outline start button] Question 2: What is the order of OCF methods called to bring up the resource? Is Monitor called before Start on host1? Does Stop and/or Monitor ever get called on host2? Monitor gets called when we first start up on every node. It also get called repeatedly on any node that we think is running it -- if you ask us to monitor the resource. Resource is up and running on host1, and I decide to move the resource to host2. I click the constraint and change it to #uname eq host2. The resource stops on host1 and starts on host2. Question 3: Same idea, what are the sequence of method calls to migrate the resource from host1 to host2? In the past... monitor every resource on every node once to see what's already running start on some node monitor periodically on some node (if requested) Request to move resource arrives... stop monitoring on some node (if it had been requested) stop resource on some node start resource on some other node monitor periodically on some other node (if requested) I'm trying to thoroughly understand the sequence of events that occurs for each phase in support of the Postgres WAL file forwarding configuration I posted last week ([Linux-HA] Two node cluster with Postgres in WAL file fwding mode, started March 1). Let me offer this caveat: Monitor might be called at any time. Stop should work at any time, and succeed harmlessly if it's already stopped. Start should work at any time, and succeed harmlessly if it's already started. http://www.linux-ha.org/OCFResourceAgent gives more details. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
On 3/20/07, Doug Knight [EMAIL PROTECTED] wrote: Hi Alan, I've had some time to try to implement the OCF script for the PostgreSQL WAL file forwarding configuration, and I still am having some issues. The main issue is how do I set up an OCF script that allows the process to run on both servers, one process in a starting mode ingesting transferred WAL files, the other in an accessible mode allowing database access and forwarding WAL files to the standby (right now the different states are determined within the OCF script by looking for the existence of a recovery.conf file, which indicates the starting, standby side)? What I'm trying to figure out is how to tell HA that it needs to start the standby process and monitor it. You can have an additional OCF parameter that controls state of the resource. But you'll need to have something external to set that parameter into the right state depending on the state of the cluster. I've started looking at the multi-state configurations, but the documentation isn't completely clear. Can you expand on it? I'd like something similar to the answers you gave for the monitor, stop, and start functions (when do they/can they get invoked, on which node, etc). One specific area of the multi-state config I have a question about is how to tell HA which server has the master role when first starting up the resource. Any additional info would be great. On this check this thread: http://lists.linux-ha.org/pipermail/linux-ha/2007-March/023646.html Thanks, Doug On Wed, 2007-03-14 at 19:59 -0600, Alan Robertson wrote: Doug Knight wrote: Yes, Thanks Alan. Let me digest it, and walk through my OCF script. I'll see if I have any other questions after that. Thanks for getting back to me. Doug On Wed, 2007-03-14 at 11:41 -0600, Alan Robertson wrote: Doug Knight wrote: Hi All, I currently am running a two node cluster (host1 and host2) with version 2.0.8. I have a resource defined with a place constraint of #uname eq host1, so that it will start on host1 (using an OCF RA script, including all of the required methods). The resource itself has target_role set to stopped. Question 1: Is the monitor method called regularly on both nodes to make sure the resource is not running? No. It is called on every node when we first start up (we call it a probe operation). If you ask us to, we will run it periodically to ensure that a running copy continues to run. You can also manually request to run this initial probe again to catch errors made by system administrators (but I don't know of anyone who does that). Next, I change the target_role to started (i.e. I use the GUI and click the start button). [Better yet, use the outline start button] Question 2: What is the order of OCF methods called to bring up the resource? Is Monitor called before Start on host1? Does Stop and/or Monitor ever get called on host2? Monitor gets called when we first start up on every node. It also get called repeatedly on any node that we think is running it -- if you ask us to monitor the resource. Resource is up and running on host1, and I decide to move the resource to host2. I click the constraint and change it to #uname eq host2. The resource stops on host1 and starts on host2. Question 3: Same idea, what are the sequence of method calls to migrate the resource from host1 to host2? In the past... monitor every resource on every node once to see what's already running start on some node monitor periodically on some node (if requested) Request to move resource arrives... stop monitoring on some node (if it had been requested) stop resource on some node start resource on some other node monitor periodically on some other node (if requested) I'm trying to thoroughly understand the sequence of events that occurs for each phase in support of the Postgres WAL file forwarding configuration I posted last week ([Linux-HA] Two node cluster with Postgres in WAL file fwding mode, started March 1). Let me offer this caveat: Monitor might be called at any time. Stop should work at any time, and succeed harmlessly if it's already stopped. Start should work at any time, and succeed harmlessly if it's already started. http://www.linux-ha.org/OCFResourceAgent gives more details. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Hi All, I currently am running a two node cluster (host1 and host2) with version 2.0.8. I have a resource defined with a place constraint of #uname eq host1, so that it will start on host1 (using an OCF RA script, including all of the required methods). The resource itself has target_role set to stopped. Question 1: Is the monitor method called regularly on both nodes to make sure the resource is not running? Next, I change the target_role to started (i.e. I use the GUI and click the start button). Question 2: What is the order of OCF methods called to bring up the resource? Is Monitor called before Start on host1? Does Stop and/or Monitor ever get called on host2? Resource is up and running on host1, and I decide to move the resource to host2. I click the constraint and change it to #uname eq host2. The resource stops on host1 and starts on host2. Question 3: Same idea, what are the sequence of method calls to migrate the resource from host1 to host2? I'm trying to thoroughly understand the sequence of events that occurs for each phase in support of the Postgres WAL file forwarding configuration I posted last week ([Linux-HA] Two node cluster with Postgres in WAL file fwding mode, started March 1). Thanks, Doug Knight WSI Inc. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Doug Knight wrote: Hi All, I currently am running a two node cluster (host1 and host2) with version 2.0.8. I have a resource defined with a place constraint of #uname eq host1, so that it will start on host1 (using an OCF RA script, including all of the required methods). The resource itself has target_role set to stopped. Question 1: Is the monitor method called regularly on both nodes to make sure the resource is not running? No. It is called on every node when we first start up (we call it a probe operation). If you ask us to, we will run it periodically to ensure that a running copy continues to run. You can also manually request to run this initial probe again to catch errors made by system administrators (but I don't know of anyone who does that). Next, I change the target_role to started (i.e. I use the GUI and click the start button). [Better yet, use the outline start button] Question 2: What is the order of OCF methods called to bring up the resource? Is Monitor called before Start on host1? Does Stop and/or Monitor ever get called on host2? Monitor gets called when we first start up on every node. It also get called repeatedly on any node that we think is running it -- if you ask us to monitor the resource. Resource is up and running on host1, and I decide to move the resource to host2. I click the constraint and change it to #uname eq host2. The resource stops on host1 and starts on host2. Question 3: Same idea, what are the sequence of method calls to migrate the resource from host1 to host2? In the past... monitor every resource on every node once to see what's already running start on some node monitor periodically on some node (if requested) Request to move resource arrives... stop monitoring on some node (if it had been requested) stop resource on some node start resource on some other node monitor periodically on some other node (if requested) I'm trying to thoroughly understand the sequence of events that occurs for each phase in support of the Postgres WAL file forwarding configuration I posted last week ([Linux-HA] Two node cluster with Postgres in WAL file fwding mode, started March 1). Does that help? -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Yes, Thanks Alan. Let me digest it, and walk through my OCF script. I'll see if I have any other questions after that. Thanks for getting back to me. Doug On Wed, 2007-03-14 at 11:41 -0600, Alan Robertson wrote: Doug Knight wrote: Hi All, I currently am running a two node cluster (host1 and host2) with version 2.0.8. I have a resource defined with a place constraint of #uname eq host1, so that it will start on host1 (using an OCF RA script, including all of the required methods). The resource itself has target_role set to stopped. Question 1: Is the monitor method called regularly on both nodes to make sure the resource is not running? No. It is called on every node when we first start up (we call it a probe operation). If you ask us to, we will run it periodically to ensure that a running copy continues to run. You can also manually request to run this initial probe again to catch errors made by system administrators (but I don't know of anyone who does that). Next, I change the target_role to started (i.e. I use the GUI and click the start button). [Better yet, use the outline start button] Question 2: What is the order of OCF methods called to bring up the resource? Is Monitor called before Start on host1? Does Stop and/or Monitor ever get called on host2? Monitor gets called when we first start up on every node. It also get called repeatedly on any node that we think is running it -- if you ask us to monitor the resource. Resource is up and running on host1, and I decide to move the resource to host2. I click the constraint and change it to #uname eq host2. The resource stops on host1 and starts on host2. Question 3: Same idea, what are the sequence of method calls to migrate the resource from host1 to host2? In the past... monitor every resource on every node once to see what's already running start on some node monitor periodically on some node (if requested) Request to move resource arrives... stop monitoring on some node (if it had been requested) stop resource on some node start resource on some other node monitor periodically on some other node (if requested) I'm trying to thoroughly understand the sequence of events that occurs for each phase in support of the Postgres WAL file forwarding configuration I posted last week ([Linux-HA] Two node cluster with Postgres in WAL file fwding mode, started March 1). Does that help? ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions
Doug Knight wrote: Yes, Thanks Alan. Let me digest it, and walk through my OCF script. I'll see if I have any other questions after that. Thanks for getting back to me. Doug On Wed, 2007-03-14 at 11:41 -0600, Alan Robertson wrote: Doug Knight wrote: Hi All, I currently am running a two node cluster (host1 and host2) with version 2.0.8. I have a resource defined with a place constraint of #uname eq host1, so that it will start on host1 (using an OCF RA script, including all of the required methods). The resource itself has target_role set to stopped. Question 1: Is the monitor method called regularly on both nodes to make sure the resource is not running? No. It is called on every node when we first start up (we call it a probe operation). If you ask us to, we will run it periodically to ensure that a running copy continues to run. You can also manually request to run this initial probe again to catch errors made by system administrators (but I don't know of anyone who does that). Next, I change the target_role to started (i.e. I use the GUI and click the start button). [Better yet, use the outline start button] Question 2: What is the order of OCF methods called to bring up the resource? Is Monitor called before Start on host1? Does Stop and/or Monitor ever get called on host2? Monitor gets called when we first start up on every node. It also get called repeatedly on any node that we think is running it -- if you ask us to monitor the resource. Resource is up and running on host1, and I decide to move the resource to host2. I click the constraint and change it to #uname eq host2. The resource stops on host1 and starts on host2. Question 3: Same idea, what are the sequence of method calls to migrate the resource from host1 to host2? In the past... monitor every resource on every node once to see what's already running start on some node monitor periodically on some node (if requested) Request to move resource arrives... stop monitoring on some node (if it had been requested) stop resource on some node start resource on some other node monitor periodically on some other node (if requested) I'm trying to thoroughly understand the sequence of events that occurs for each phase in support of the Postgres WAL file forwarding configuration I posted last week ([Linux-HA] Two node cluster with Postgres in WAL file fwding mode, started March 1). Let me offer this caveat: Monitor might be called at any time. Stop should work at any time, and succeed harmlessly if it's already stopped. Start should work at any time, and succeed harmlessly if it's already started. http://www.linux-ha.org/OCFResourceAgent gives more details. -- Alan Robertson [EMAIL PROTECTED] Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/