Re: [Linux-HA] Q: Resource migration (Xen live migration)

2015-03-29 Thread Andrew Beekhof

 On 13 Feb 2015, at 8:38 pm, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de 
 wrote:
 
 Hello!
 
 I have some questions on pacemakers's resource migration. We have a Xen host 
 that has some problems (still to be investigated) that causes some VM disk 
 not be be ready for use.
 
 When tyring to migrate a VM frem the bad host to a good host through 
 pacemaker, migration seemed to hang. At some state the source VM was no 
 longer present on the bad host (Unable to find domain 'v09'), but pacemaker 
 still tried a migration:
 crmd[6779]:   notice: te_rsc_command: Initiating action 100: migrate_from 
 prm_xen_v09_migrate_from_0 on h05
 Only after the timeout CRM realized that there is a problem:
 crmd[6779]:  warning: status_from_rc: Action 100 (prm_xen_v09_migrate_from_0) 
 on h05 failed (target: 0 vs. rc: 1): Error
 After that CRM still stried a stop on the source host (h10) (and on the 
 destination host):
 crmd[6779]:   notice: te_rsc_command: Initiating action 98: stop 
 prm_xen_v09_stop_0 on h10
 crmd[6779]:   notice: te_rsc_command: Initiating action 26: stop 
 prm_xen_v09_stop_0 on h05
 
 Q1: Is this the way it should work?

Mostly, but the agent should have detected the condition earlier and returned 
an error (instead of timing out). 

 
 Before that we had the same situation (thae bad host had been set to 
 standby) when someone tired of waiting so long destroyed the affected Xen 
 VMS on the source host while the cluster was migrating. Eventually the VMs 
 came up (restarted instead of being live migrated) on the good hosts.
 
 Then we shutdown OpenAIS on the bad host, installed updates and rebooted the 
 bad host (during reboot OpenAIS was started (still standby)).
 To my surprise pacemaker thought the VMS were still running on the bad host 
 and initiated a migration.

That would be coming from the resource agent.

 As there were no source VMs on the bad host, but alle the affected VMs were 
 running on some good host, CRM stutdown the VMs on the good hostss, just to 
 restart them.
 
 Q2: Ist this expected behavior? I can hardly believe!

Nope, fix the agent :)

 
 Software is SLES11 SP3 with pacemaker-1.1.11-0.7.53 (and related) on all 
 hosts.
 
 Regards,
 Ulrich
 
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Q: Resource migration (Xen live migration)

2015-02-13 Thread Ulrich Windl
Hello!

I have some questions on pacemakers's resource migration. We have a Xen host 
that has some problems (still to be investigated) that causes some VM disk not 
be be ready for use.

When tyring to migrate a VM frem the bad host to a good host through pacemaker, 
migration seemed to hang. At some state the source VM was no longer present 
on the bad host (Unable to find domain 'v09'), but pacemaker still tried a 
migration:
crmd[6779]:   notice: te_rsc_command: Initiating action 100: migrate_from 
prm_xen_v09_migrate_from_0 on h05
Only after the timeout CRM realized that there is a problem:
crmd[6779]:  warning: status_from_rc: Action 100 (prm_xen_v09_migrate_from_0) 
on h05 failed (target: 0 vs. rc: 1): Error
After that CRM still stried a stop on the source host (h10) (and on the 
destination host):
crmd[6779]:   notice: te_rsc_command: Initiating action 98: stop 
prm_xen_v09_stop_0 on h10
crmd[6779]:   notice: te_rsc_command: Initiating action 26: stop 
prm_xen_v09_stop_0 on h05

Q1: Is this the way it should work?

Before that we had the same situation (thae bad host had been set to standby) 
when someone tired of waiting so long destroyed the affected Xen VMS on the 
source host while the cluster was migrating. Eventually the VMs came up 
(restarted instead of being live migrated) on the good hosts.

Then we shutdown OpenAIS on the bad host, installed updates and rebooted the 
bad host (during reboot OpenAIS was started (still standby)).
To my surprise pacemaker thought the VMS were still running on the bad host and 
initiated a migration. As there were no source VMs on the bad host, but alle 
the affected VMs were running on some good host, CRM stutdown the VMs on the 
good hostss, just to restart them.

Q2: Ist this expected behavior? I can hardly believe!

Software is SLES11 SP3 with pacemaker-1.1.11-0.7.53 (and related) on all hosts.

Regards,
Ulrich


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems