Re: [ClusterLabs] resources do not migrate although node is going to standby

Ken Gaillot Mon, 24 Jul 2017 14:54:57 -0700

On Mon, 2017-07-24 at 20:52 +0200, Lentes, Bernd wrote:
> Hi,
> 
> just to be sure:
> i have a VirtualDomain resource (called prim_vm_servers_alive) running on one 
> node (ha-idg-2). From reasons i don't remember i have a location constraint:
> location cli-prefer-prim_vm_servers_alive prim_vm_servers_alive role=Started 
> inf: ha-idg-2
> 
> Now i try to set this node into standby, because i need it to reboot.
> From what i think now the resource can't migrate to node ha-idg-1 because of 
> this constraint. Right ?


Right, the "inf:" makes it mandatory. BTW, the "cli-" at the beginning
indicates that this was created by a command-line tool such as pcs, crm
shell or crm_resource. Such tools implement "ban"/"move" type commands
by adding such constraints, and then offer a separate manual command to
remove such constraints (e.g. "pcs resource clear").

> 
> That's what the log says:
> Jul 21 18:03:50 ha-idg-2 VirtualDomain(prim_vm_servers_alive)[28565]: ERROR: 
> Server_Monitoring: live migration to qemu+ssh://ha-idg-1/system  failed: 1
> Jul 21 18:03:50 ha-idg-2 lrmd[8573]:   notice: operation_finished: 
> prim_vm_servers_alive_migrate_to_0:28565:stderr [ error: Requested operation 
> is not valid: domain 'Server_Monitoring' is already active ]
> Jul 21 18:03:50 ha-idg-2 crmd[8576]:   notice: process_lrm_event: Operation 
> prim_vm_servers_alive_migrate_to_0: unknown error (node=ha-idg-2, call=114, 
> rc=1, cib-update=572, confirmed=true)
> Jul 21 18:03:50 ha-idg-2 crmd[8576]:   notice: process_lrm_event: 
> ha-idg-2-prim_vm_servers_alive_migrate_to_0:114 [ error: Requested operation 
> is not valid: domain 'Server_Monitoring' is already active\n ]
> Jul 21 18:03:50 ha-idg-2 crmd[8576]:  warning: status_from_rc: Action 64 
> (prim_vm_servers_alive_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 
> 1): Error
> Jul 21 18:03:50 ha-idg-2 crmd[8576]:   notice: abort_transition_graph: 
> Transition aborted by prim_vm_servers_alive_migrate_to_0 'modify' on 
> ha-idg-2: Event failed 
> (magic=0:1;64:417:0:656ecd4a-f8e8-46c9-b4e6-194616237988, cib=0.879.5, sou
> rce=match_graph_event:350, 0)
> Jul 21 18:03:50 ha-idg-2 crmd[8576]:  warning: status_from_rc: Action 64 
> (prim_vm_servers_alive_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 
> 1): Error
> Jul 21 18:03:53 ha-idg-2 VirtualDomain(prim_vm_mausdb)[28564]: ERROR: 
> mausdb_vm: live migration to qemu+ssh://ha-idg-1/system  failed: 1
> 
> That is the way i understand "Requested operation is not valid". It's not 
> possible because of the constraint.
> I just wanted to be sure. And because the resource can't be migrated but the 
> host is going to standby the resource is stopped. Right ?
> 
> Strange is that a second resource also running on node ha-idg-2 called 
> prim_vm_mausdb also didn't migrate to the other node. And that's something i 
> don't understand completely.
> The resource didn't have any location constraint.
> Both VirtualDomains have a vnc server configured (that i can monitor the boot 
> procedure if i have starting problems). The vnc port for prim_vm_mausdb is 
> 5900 in the configuration file.
> The port is set to auto for prim_vm_servers_alive because i forgot to 
> configure it fix. So it must be s.th like 5900+ because both resources were 
> running concurrently on the same node.
> But prim_vm_mausdb can't migrate because the port is occupied on the other 
> node ha-idg-1:
> 
> Jul 21 18:03:53 ha-idg-2 VirtualDomain(prim_vm_mausdb)[28564]: ERROR: 
> mausdb_vm: live migration to qemu+ssh://ha-idg-1/system  failed: 1
> Jul 21 18:03:53 ha-idg-2 lrmd[8573]:   notice: operation_finished: 
> prim_vm_mausdb_migrate_to_0:28564:stderr [ error: internal error: early end 
> of file from monitor: possible problem: ]
> Jul 21 18:03:53 ha-idg-2 lrmd[8573]:   notice: operation_finished: 
> prim_vm_mausdb_migrate_to_0:28564:stderr [ Failed to start VNC server on 
> `127.0.0.1:0,share=allow-exclusive': Failed to bind socket: Address already 
> in use ]
> Jul 21 18:03:53 ha-idg-2 lrmd[8573]:   notice: operation_finished: 
> prim_vm_mausdb_migrate_to_0:28564:stderr [  ]
> Jul 21 18:03:53 ha-idg-2 crmd[8576]:   notice: process_lrm_event: Operation 
> prim_vm_mausdb_migrate_to_0: unknown error (node=ha-idg-2, call=110, rc=1, 
> cib-update=573, confirmed=true)
> Jul 21 18:03:53 ha-idg-2 crmd[8576]:   notice: process_lrm_event: 
> ha-idg-2-prim_vm_mausdb_migrate_to_0:110 [ error: internal error: early end 
> of file from monitor: possible problem:\nFailed to start VNC server on 
> `127.0.0.1:0,share=allow
> -exclusive': Failed to bind socket: Address already in use\n\n ]
> Jul 21 18:03:53 ha-idg-2 crmd[8576]:  warning: status_from_rc: Action 51 
> (prim_vm_mausdb_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 1): Error
> Jul 21 18:03:53 ha-idg-2 crmd[8576]:  warning: status_from_rc: Action 51 
> (prim_vm_mausdb_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 1): Error
> 
> Do i understand it correctly that the port is occupied on the node it should 
> migrate to (ha-idg-1) ?

It looks like it

> But there is no vm running and i don't have a standalone vnc server 
> configured. Why is the port occupied ?

Can't help there

> Btw: the network sockets are live migrated too during a live migration of a 
> VirtualDomain resource ?
> It should be like that.
> 
> Thanks.
> 
> 
> Bernd

My memory is hazy, but I think TCP connections are migrated as long as
the migration is under the TCP timeout. I could be mis-remembering.
-- 
Ken Gaillot <kgail...@redhat.com>





_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] resources do not migrate although node is going to standby

Reply via email to