Re: [ClusterLabs] resources do not migrate although node is going to standby

2017-07-24 Thread Ken Gaillot
On Mon, 2017-07-24 at 20:52 +0200, Lentes, Bernd wrote:
> Hi,
> 
> just to be sure:
> i have a VirtualDomain resource (called prim_vm_servers_alive) running on one 
> node (ha-idg-2). From reasons i don't remember i have a location constraint:
> location cli-prefer-prim_vm_servers_alive prim_vm_servers_alive role=Started 
> inf: ha-idg-2
> 
> Now i try to set this node into standby, because i need it to reboot.
> From what i think now the resource can't migrate to node ha-idg-1 because of 
> this constraint. Right ?

Right, the "inf:" makes it mandatory. BTW, the "cli-" at the beginning
indicates that this was created by a command-line tool such as pcs, crm
shell or crm_resource. Such tools implement "ban"/"move" type commands
by adding such constraints, and then offer a separate manual command to
remove such constraints (e.g. "pcs resource clear").

> 
> That's what the log says:
> Jul 21 18:03:50 ha-idg-2 VirtualDomain(prim_vm_servers_alive)[28565]: ERROR: 
> Server_Monitoring: live migration to qemu+ssh://ha-idg-1/system  failed: 1
> Jul 21 18:03:50 ha-idg-2 lrmd[8573]:   notice: operation_finished: 
> prim_vm_servers_alive_migrate_to_0:28565:stderr [ error: Requested operation 
> is not valid: domain 'Server_Monitoring' is already active ]
> Jul 21 18:03:50 ha-idg-2 crmd[8576]:   notice: process_lrm_event: Operation 
> prim_vm_servers_alive_migrate_to_0: unknown error (node=ha-idg-2, call=114, 
> rc=1, cib-update=572, confirmed=true)
> Jul 21 18:03:50 ha-idg-2 crmd[8576]:   notice: process_lrm_event: 
> ha-idg-2-prim_vm_servers_alive_migrate_to_0:114 [ error: Requested operation 
> is not valid: domain 'Server_Monitoring' is already active\n ]
> Jul 21 18:03:50 ha-idg-2 crmd[8576]:  warning: status_from_rc: Action 64 
> (prim_vm_servers_alive_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 
> 1): Error
> Jul 21 18:03:50 ha-idg-2 crmd[8576]:   notice: abort_transition_graph: 
> Transition aborted by prim_vm_servers_alive_migrate_to_0 'modify' on 
> ha-idg-2: Event failed 
> (magic=0:1;64:417:0:656ecd4a-f8e8-46c9-b4e6-194616237988, cib=0.879.5, sou
> rce=match_graph_event:350, 0)
> Jul 21 18:03:50 ha-idg-2 crmd[8576]:  warning: status_from_rc: Action 64 
> (prim_vm_servers_alive_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 
> 1): Error
> Jul 21 18:03:53 ha-idg-2 VirtualDomain(prim_vm_mausdb)[28564]: ERROR: 
> mausdb_vm: live migration to qemu+ssh://ha-idg-1/system  failed: 1
> 
> That is the way i understand "Requested operation is not valid". It's not 
> possible because of the constraint.
> I just wanted to be sure. And because the resource can't be migrated but the 
> host is going to standby the resource is stopped. Right ?
> 
> Strange is that a second resource also running on node ha-idg-2 called 
> prim_vm_mausdb also didn't migrate to the other node. And that's something i 
> don't understand completely.
> The resource didn't have any location constraint.
> Both VirtualDomains have a vnc server configured (that i can monitor the boot 
> procedure if i have starting problems). The vnc port for prim_vm_mausdb is 
> 5900 in the configuration file.
> The port is set to auto for prim_vm_servers_alive because i forgot to 
> configure it fix. So it must be s.th like 5900+ because both resources were 
> running concurrently on the same node.
> But prim_vm_mausdb can't migrate because the port is occupied on the other 
> node ha-idg-1:
> 
> Jul 21 18:03:53 ha-idg-2 VirtualDomain(prim_vm_mausdb)[28564]: ERROR: 
> mausdb_vm: live migration to qemu+ssh://ha-idg-1/system  failed: 1
> Jul 21 18:03:53 ha-idg-2 lrmd[8573]:   notice: operation_finished: 
> prim_vm_mausdb_migrate_to_0:28564:stderr [ error: internal error: early end 
> of file from monitor: possible problem: ]
> Jul 21 18:03:53 ha-idg-2 lrmd[8573]:   notice: operation_finished: 
> prim_vm_mausdb_migrate_to_0:28564:stderr [ Failed to start VNC server on 
> `127.0.0.1:0,share=allow-exclusive': Failed to bind socket: Address already 
> in use ]
> Jul 21 18:03:53 ha-idg-2 lrmd[8573]:   notice: operation_finished: 
> prim_vm_mausdb_migrate_to_0:28564:stderr [  ]
> Jul 21 18:03:53 ha-idg-2 crmd[8576]:   notice: process_lrm_event: Operation 
> prim_vm_mausdb_migrate_to_0: unknown error (node=ha-idg-2, call=110, rc=1, 
> cib-update=573, confirmed=true)
> Jul 21 18:03:53 ha-idg-2 crmd[8576]:   notice: process_lrm_event: 
> ha-idg-2-prim_vm_mausdb_migrate_to_0:110 [ error: internal error: early end 
> of file from monitor: possible problem:\nFailed to start VNC server on 
> `127.0.0.1:0,share=allow
> -exclusive': Failed to bind socket: Address already in use\n\n ]
> Jul 21 18:03:53 ha-idg-2 crmd[8576]:  warning: status_from_rc: Action 51 
> (prim_vm_mausdb_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 1): Error
> Jul 21 18:03:53 ha-idg-2 crmd[8576]:  warning: status_from_rc: Action 51 
> (prim_vm_mausdb_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 1): Error
> 
> Do i understand it correctly that the port is occupied on the node it 

Re: [ClusterLabs] resources do not migrate although node is going to standby

2017-07-24 Thread Kristián Feldsam
hmmi think that it is just prefered location, if it is not available, server 
should start on other node. you can of cource migrate manualy byt crm resource 
move resource_name node_name - which in effect change that location pref

S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: supp...@feldhost.cz

www.feldhost.cz - FeldHost™ – profesionální hostingové a serverové služby za 
adekvátní ceny.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010  0024 0033 0446

> On 24 Jul 2017, at 20:52, Lentes, Bernd  
> wrote:
> 
> Hi,
> 
> just to be sure:
> i have a VirtualDomain resource (called prim_vm_servers_alive) running on one 
> node (ha-idg-2). From reasons i don't remember i have a location constraint:
> location cli-prefer-prim_vm_servers_alive prim_vm_servers_alive role=Started 
> inf: ha-idg-2
> 
> Now i try to set this node into standby, because i need it to reboot.
> From what i think now the resource can't migrate to node ha-idg-1 because of 
> this constraint. Right ?
> 
> That's what the log says:
> Jul 21 18:03:50 ha-idg-2 VirtualDomain(prim_vm_servers_alive)[28565]: ERROR: 
> Server_Monitoring: live migration to qemu+ssh://ha-idg-1/system  failed: 1
> Jul 21 18:03:50 ha-idg-2 lrmd[8573]:   notice: operation_finished: 
> prim_vm_servers_alive_migrate_to_0:28565:stderr [ error: Requested operation 
> is not valid: domain 'Server_Monitoring' is already active ]
> Jul 21 18:03:50 ha-idg-2 crmd[8576]:   notice: process_lrm_event: Operation 
> prim_vm_servers_alive_migrate_to_0: unknown error (node=ha-idg-2, call=114, 
> rc=1, cib-update=572, confirmed=true)
> Jul 21 18:03:50 ha-idg-2 crmd[8576]:   notice: process_lrm_event: 
> ha-idg-2-prim_vm_servers_alive_migrate_to_0:114 [ error: Requested operation 
> is not valid: domain 'Server_Monitoring' is already active\n ]
> Jul 21 18:03:50 ha-idg-2 crmd[8576]:  warning: status_from_rc: Action 64 
> (prim_vm_servers_alive_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 
> 1): Error
> Jul 21 18:03:50 ha-idg-2 crmd[8576]:   notice: abort_transition_graph: 
> Transition aborted by prim_vm_servers_alive_migrate_to_0 'modify' on 
> ha-idg-2: Event failed 
> (magic=0:1;64:417:0:656ecd4a-f8e8-46c9-b4e6-194616237988, cib=0.879.5, sou
> rce=match_graph_event:350, 0)
> Jul 21 18:03:50 ha-idg-2 crmd[8576]:  warning: status_from_rc: Action 64 
> (prim_vm_servers_alive_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 
> 1): Error
> Jul 21 18:03:53 ha-idg-2 VirtualDomain(prim_vm_mausdb)[28564]: ERROR: 
> mausdb_vm: live migration to qemu+ssh://ha-idg-1/system  failed: 1
> 
> That is the way i understand "Requested operation is not valid". It's not 
> possible because of the constraint.
> I just wanted to be sure. And because the resource can't be migrated but the 
> host is going to standby the resource is stopped. Right ?
> 
> Strange is that a second resource also running on node ha-idg-2 called 
> prim_vm_mausdb also didn't migrate to the other node. And that's something i 
> don't understand completely.
> The resource didn't have any location constraint.
> Both VirtualDomains have a vnc server configured (that i can monitor the boot 
> procedure if i have starting problems). The vnc port for prim_vm_mausdb is 
> 5900 in the configuration file.
> The port is set to auto for prim_vm_servers_alive because i forgot to 
> configure it fix. So it must be s.th like 5900+ because both resources were 
> running concurrently on the same node.
> But prim_vm_mausdb can't migrate because the port is occupied on the other 
> node ha-idg-1:
> 
> Jul 21 18:03:53 ha-idg-2 VirtualDomain(prim_vm_mausdb)[28564]: ERROR: 
> mausdb_vm: live migration to qemu+ssh://ha-idg-1/system  failed: 1
> Jul 21 18:03:53 ha-idg-2 lrmd[8573]:   notice: operation_finished: 
> prim_vm_mausdb_migrate_to_0:28564:stderr [ error: internal error: early end 
> of file from monitor: possible problem: ]
> Jul 21 18:03:53 ha-idg-2 lrmd[8573]:   notice: operation_finished: 
> prim_vm_mausdb_migrate_to_0:28564:stderr [ Failed to start VNC server on 
> `127.0.0.1:0,share=allow-exclusive': Failed to bind socket: Address already 
> in use ]
> Jul 21 18:03:53 ha-idg-2 lrmd[8573]:   notice: operation_finished: 
> prim_vm_mausdb_migrate_to_0:28564:stderr [  ]
> Jul 21 18:03:53 ha-idg-2 crmd[8576]:   notice: process_lrm_event: Operation 
> prim_vm_mausdb_migrate_to_0: unknown error (node=ha-idg-2, call=110, rc=1, 
> cib-update=573, confirmed=true)
> Jul 21 18:03:53 ha-idg-2 crmd[8576]:   notice: process_lrm_event: 
> ha-idg-2-prim_vm_mausdb_migrate_to_0:110 [ error: internal error: early end 
> of file from monitor: possible problem:\nFailed to start VNC server on 
> `127.0.0.1:0,share=allow
> -exclusive': Failed to bind socket: Address already in use\n\n ]
> Jul 21 18:03:53 ha-idg-2 

[ClusterLabs] resources do not migrate although node is going to standby

2017-07-24 Thread Lentes, Bernd
Hi,

just to be sure:
i have a VirtualDomain resource (called prim_vm_servers_alive) running on one 
node (ha-idg-2). From reasons i don't remember i have a location constraint:
location cli-prefer-prim_vm_servers_alive prim_vm_servers_alive role=Started 
inf: ha-idg-2

Now i try to set this node into standby, because i need it to reboot.
From what i think now the resource can't migrate to node ha-idg-1 because of 
this constraint. Right ?

That's what the log says:
Jul 21 18:03:50 ha-idg-2 VirtualDomain(prim_vm_servers_alive)[28565]: ERROR: 
Server_Monitoring: live migration to qemu+ssh://ha-idg-1/system  failed: 1
Jul 21 18:03:50 ha-idg-2 lrmd[8573]:   notice: operation_finished: 
prim_vm_servers_alive_migrate_to_0:28565:stderr [ error: Requested operation is 
not valid: domain 'Server_Monitoring' is already active ]
Jul 21 18:03:50 ha-idg-2 crmd[8576]:   notice: process_lrm_event: Operation 
prim_vm_servers_alive_migrate_to_0: unknown error (node=ha-idg-2, call=114, 
rc=1, cib-update=572, confirmed=true)
Jul 21 18:03:50 ha-idg-2 crmd[8576]:   notice: process_lrm_event: 
ha-idg-2-prim_vm_servers_alive_migrate_to_0:114 [ error: Requested operation is 
not valid: domain 'Server_Monitoring' is already active\n ]
Jul 21 18:03:50 ha-idg-2 crmd[8576]:  warning: status_from_rc: Action 64 
(prim_vm_servers_alive_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 1): 
Error
Jul 21 18:03:50 ha-idg-2 crmd[8576]:   notice: abort_transition_graph: 
Transition aborted by prim_vm_servers_alive_migrate_to_0 'modify' on ha-idg-2: 
Event failed (magic=0:1;64:417:0:656ecd4a-f8e8-46c9-b4e6-194616237988, 
cib=0.879.5, sou
rce=match_graph_event:350, 0)
Jul 21 18:03:50 ha-idg-2 crmd[8576]:  warning: status_from_rc: Action 64 
(prim_vm_servers_alive_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 1): 
Error
Jul 21 18:03:53 ha-idg-2 VirtualDomain(prim_vm_mausdb)[28564]: ERROR: 
mausdb_vm: live migration to qemu+ssh://ha-idg-1/system  failed: 1

That is the way i understand "Requested operation is not valid". It's not 
possible because of the constraint.
I just wanted to be sure. And because the resource can't be migrated but the 
host is going to standby the resource is stopped. Right ?

Strange is that a second resource also running on node ha-idg-2 called 
prim_vm_mausdb also didn't migrate to the other node. And that's something i 
don't understand completely.
The resource didn't have any location constraint.
Both VirtualDomains have a vnc server configured (that i can monitor the boot 
procedure if i have starting problems). The vnc port for prim_vm_mausdb is 5900 
in the configuration file.
The port is set to auto for prim_vm_servers_alive because i forgot to configure 
it fix. So it must be s.th like 5900+ because both resources were running 
concurrently on the same node.
But prim_vm_mausdb can't migrate because the port is occupied on the other node 
ha-idg-1:

Jul 21 18:03:53 ha-idg-2 VirtualDomain(prim_vm_mausdb)[28564]: ERROR: 
mausdb_vm: live migration to qemu+ssh://ha-idg-1/system  failed: 1
Jul 21 18:03:53 ha-idg-2 lrmd[8573]:   notice: operation_finished: 
prim_vm_mausdb_migrate_to_0:28564:stderr [ error: internal error: early end of 
file from monitor: possible problem: ]
Jul 21 18:03:53 ha-idg-2 lrmd[8573]:   notice: operation_finished: 
prim_vm_mausdb_migrate_to_0:28564:stderr [ Failed to start VNC server on 
`127.0.0.1:0,share=allow-exclusive': Failed to bind socket: Address already in 
use ]
Jul 21 18:03:53 ha-idg-2 lrmd[8573]:   notice: operation_finished: 
prim_vm_mausdb_migrate_to_0:28564:stderr [  ]
Jul 21 18:03:53 ha-idg-2 crmd[8576]:   notice: process_lrm_event: Operation 
prim_vm_mausdb_migrate_to_0: unknown error (node=ha-idg-2, call=110, rc=1, 
cib-update=573, confirmed=true)
Jul 21 18:03:53 ha-idg-2 crmd[8576]:   notice: process_lrm_event: 
ha-idg-2-prim_vm_mausdb_migrate_to_0:110 [ error: internal error: early end of 
file from monitor: possible problem:\nFailed to start VNC server on 
`127.0.0.1:0,share=allow
-exclusive': Failed to bind socket: Address already in use\n\n ]
Jul 21 18:03:53 ha-idg-2 crmd[8576]:  warning: status_from_rc: Action 51 
(prim_vm_mausdb_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 1): Error
Jul 21 18:03:53 ha-idg-2 crmd[8576]:  warning: status_from_rc: Action 51 
(prim_vm_mausdb_migrate_to_0) on ha-idg-2 failed (target: 0 vs. rc: 1): Error

Do i understand it correctly that the port is occupied on the node it should 
migrate to (ha-idg-1) ?
But there is no vm running and i don't have a standalone vnc server configured. 
Why is the port occupied ?

Btw: the network sockets are live migrated too during a live migration of a 
VirtualDomain resource ?
It should be like that.

Thanks.


Bernd



-- 
Bernd Lentes 

Systemadministration 
institute of developmental genetics 
Gebäude 35.34 - Raum 208 
HelmholtzZentrum München 
bernd.len...@helmholtz-muenchen.de 
phone: +49 (0)89 3187 1241 
fax: +49 (0)89 3187 2294 

no backup - no mercy
 

Helmholtz Zentrum Muenchen