[Linux-ha-dev] Medium: Pure-FTPd: Create pid directory if needed
https://github.com/ClusterLabs/resource-agents/pull/339 Medium: Pure-FTPd: Create PID directory if needed This patch is required for Ubuntu where /var/run has been replaced with tmpfs. Thus Pure-FTPd's default PID directory /var/run/pure-ftpd/ does not exist after a reboot. Normally, the init-script would take care of that but we're not using the init-script for this resource agent. Dev: Pure-FTPd: fix spacing spacing fix only - no changes ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-HA] Heartbeat and load balancing
Hello guys, I'm trying to set up a Active/Active Heartbeat configuration. I would like to have both nodes responding on the same IP balancing the requests. I'm able to do that, but when second node has the resource configured with another virtual IP. Is that only possible with LVS? Another questions: I set up two resources, one for httpd and other for cups. I used different IPs for each of them. However, I can access either applications by using both IPs. Why does that happen? Thank you! ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: Xen RA and rebooting
Hi Dejan. Sorry to be slow to respond to this. I have done some testing and everything looks good. I spent some time tweaking the RA and I added a parameter called wait_for_reboot (default 5s) to allow us to override the reboot sleep times (in case it's more than 5 seconds on really loaded hypervisors). I also cleaned up a few log entries to make them consistent in the RA and edited your entries for xen status to be a little bit more clear as to why we think we should be waiting. I have attached a patch here because I have NO idea how to create a branch and pull request. If there are links to a good place to start I may be able to contribute occasionally to some other RAs that I use. Please let me know what you think. Thanks for your help Tom On 10/17/2013 06:10 AM, Dejan Muhamedagic wrote: On Thu, Oct 17, 2013 at 11:45:17AM +0200, Dejan Muhamedagic wrote: Hi Tom, On Wed, Oct 16, 2013 at 05:28:28PM -0400, Tom Parker wrote: Some more reading of the source code makes me think the || [ $__OCF_ACTION != stop ]; is not needed. Yes, you're right. I'll drop that part of the if statement. Many thanks for testing. Fixed now. The if statement, which was obviously hard to follow, got relegated to the monitor function. Which makes the Xen_Status_with_Retry really stand for what's happening in there ;-) Tom, hope you can test again. Cheers, Dejan Cheers, Dejan Xen_Status_with_Retry() is only called from Stop and Monitor so we only need to check if it's a probe. Everything else should be handled in the case statement in the loop. Tom On 10/16/2013 05:16 PM, Tom Parker wrote: Hi. I think there is an issue with the Updated Xen RA. I think there is an issue with the if statement here but I am not sure. I may be confused about how bash || works but I don't see my servers ever entering the loop on a vm disappearing. if ocf_is_probe || [ $__OCF_ACTION != stop ]; then return $rc fi Does this not mean that if we run a monitor operation that is not a probe we will have: (ocf_is_probe) return false (stop != monitor) return true (false || true) return true which will cause the if statement to return $rc and never enter the loop? Xen_Status_with_Retry() { local rc cnt=5 Xen_Status $1 rc=$? if ocf_is_probe || [ $__OCF_ACTION != stop ]; then return $rc fi while [ $rc -eq $OCF_NOT_RUNNING -a $cnt -gt 0 ]; do case $__OCF_ACTION in stop) ocf_log debug domain $1 reported as not running, waiting $cnt seconds ... ;; monitor) ocf_log warn domain $1 reported as not running, but it is expected to be running! Retrying for $cnt seconds ... ;; *) : not reachable ;; esac sleep 1 Xen_Status $1 rc=$? let cnt=$((cnt-1)) done return $rc } On 10/16/2013 12:12 PM, Dejan Muhamedagic wrote: Hi Tom, On Tue, Oct 15, 2013 at 07:55:11PM -0400, Tom Parker wrote: Hi Dejan Just a quick question. I cannot see your new log messages being logged to syslog ocf_log warn domain $1 reported as not running, but it is expected to be running! Retrying for $cnt seconds ... Do you know where I can set my logging to see warn level messages? I expected to see them in my testing by default but that does not seem to be true. You should see them by default. But note that these warnings may not happen, depending on the circumstances on your host. In my experiments they were logged only while the guest was rebooting and then just once or maybe twice. If you have recent resource-agents and crmsh, you can enable operation tracing (with crm resource trace rsc monitor interval). Thanks, Dejan Thanks Tom On 10/08/2013 05:04 PM, Dejan Muhamedagic wrote: Hi, On Tue, Oct 08, 2013 at 01:52:56PM +0200, Ulrich Windl wrote: Hi! I thought, I'll never be bitten by this bug, but I actually was! Now I'm wondering whether the Xen RA sees the guest if you use pygrub, and pygrub is still counting down for actual boot... But the reason why I'm writing is that I think I've discovered another bug in the RA: CRM decided to recover the guest VM v02: [...] lrmd: [14903]: info: operation monitor[28] on prm_xen_v02 for client 14906: pid 19516 exited with return code 7 [...] pengine: [14905]: notice: LogActions: Recover prm_xen_v02 (Started h05) [...] crmd: [14906]: info: te_rsc_command: Initiating action 5: stop prm_xen_v02_stop_0 on h05 (local) [...] Xen(prm_xen_v02)[19552]: INFO: Xen domain v02 already stopped. [...] lrmd: [14903]: info: operation stop[31] on prm_xen_v02 for client 14906: pid 19552 exited with return code 0 [...] crmd: [14906]: info: te_rsc_command: Initiating action 78: start prm_xen_v02_start_0 on h05 (local) lrmd: [14903]: info: rsc:prm_xen_v02 start[32] (pid 19686) [...] lrmd: [14903]: info: RA output: (prm_xen_v02:start:stderr) Error: Domain 'v02'
Re: [Linux-HA] Antw: Re: Xen RA and rebooting
I may have actually created the pull request properly... Please let me know and again thanks for your help. Tom On 10/18/2013 01:30 PM, Tom Parker wrote: Hi Dejan. Sorry to be slow to respond to this. I have done some testing and everything looks good. I spent some time tweaking the RA and I added a parameter called wait_for_reboot (default 5s) to allow us to override the reboot sleep times (in case it's more than 5 seconds on really loaded hypervisors). I also cleaned up a few log entries to make them consistent in the RA and edited your entries for xen status to be a little bit more clear as to why we think we should be waiting. I have attached a patch here because I have NO idea how to create a branch and pull request. If there are links to a good place to start I may be able to contribute occasionally to some other RAs that I use. Please let me know what you think. Thanks for your help Tom On 10/17/2013 06:10 AM, Dejan Muhamedagic wrote: On Thu, Oct 17, 2013 at 11:45:17AM +0200, Dejan Muhamedagic wrote: Hi Tom, On Wed, Oct 16, 2013 at 05:28:28PM -0400, Tom Parker wrote: Some more reading of the source code makes me think the || [ $__OCF_ACTION != stop ]; is not needed. Yes, you're right. I'll drop that part of the if statement. Many thanks for testing. Fixed now. The if statement, which was obviously hard to follow, got relegated to the monitor function. Which makes the Xen_Status_with_Retry really stand for what's happening in there ;-) Tom, hope you can test again. Cheers, Dejan Cheers, Dejan Xen_Status_with_Retry() is only called from Stop and Monitor so we only need to check if it's a probe. Everything else should be handled in the case statement in the loop. Tom On 10/16/2013 05:16 PM, Tom Parker wrote: Hi. I think there is an issue with the Updated Xen RA. I think there is an issue with the if statement here but I am not sure. I may be confused about how bash || works but I don't see my servers ever entering the loop on a vm disappearing. if ocf_is_probe || [ $__OCF_ACTION != stop ]; then return $rc fi Does this not mean that if we run a monitor operation that is not a probe we will have: (ocf_is_probe) return false (stop != monitor) return true (false || true) return true which will cause the if statement to return $rc and never enter the loop? Xen_Status_with_Retry() { local rc cnt=5 Xen_Status $1 rc=$? if ocf_is_probe || [ $__OCF_ACTION != stop ]; then return $rc fi while [ $rc -eq $OCF_NOT_RUNNING -a $cnt -gt 0 ]; do case $__OCF_ACTION in stop) ocf_log debug domain $1 reported as not running, waiting $cnt seconds ... ;; monitor) ocf_log warn domain $1 reported as not running, but it is expected to be running! Retrying for $cnt seconds ... ;; *) : not reachable ;; esac sleep 1 Xen_Status $1 rc=$? let cnt=$((cnt-1)) done return $rc } On 10/16/2013 12:12 PM, Dejan Muhamedagic wrote: Hi Tom, On Tue, Oct 15, 2013 at 07:55:11PM -0400, Tom Parker wrote: Hi Dejan Just a quick question. I cannot see your new log messages being logged to syslog ocf_log warn domain $1 reported as not running, but it is expected to be running! Retrying for $cnt seconds ... Do you know where I can set my logging to see warn level messages? I expected to see them in my testing by default but that does not seem to be true. You should see them by default. But note that these warnings may not happen, depending on the circumstances on your host. In my experiments they were logged only while the guest was rebooting and then just once or maybe twice. If you have recent resource-agents and crmsh, you can enable operation tracing (with crm resource trace rsc monitor interval). Thanks, Dejan Thanks Tom On 10/08/2013 05:04 PM, Dejan Muhamedagic wrote: Hi, On Tue, Oct 08, 2013 at 01:52:56PM +0200, Ulrich Windl wrote: Hi! I thought, I'll never be bitten by this bug, but I actually was! Now I'm wondering whether the Xen RA sees the guest if you use pygrub, and pygrub is still counting down for actual boot... But the reason why I'm writing is that I think I've discovered another bug in the RA: CRM decided to recover the guest VM v02: [...] lrmd: [14903]: info: operation monitor[28] on prm_xen_v02 for client 14906: pid 19516 exited with return code 7 [...] pengine: [14905]: notice: LogActions: Recover prm_xen_v02 (Started h05) [...] crmd: [14906]: info: te_rsc_command: Initiating action 5: stop prm_xen_v02_stop_0 on h05 (local) [...] Xen(prm_xen_v02)[19552]: INFO: Xen domain v02 already stopped. [...] lrmd: [14903]: info: operation stop[31] on prm_xen_v02 for client 14906: pid 19552 exited with return code 0 [...] crmd: [14906]: info: te_rsc_command: Initiating action 78: start