I have made test, with a loop of move file each second,

and monitor the time between source and target.

the results are between 10ms and 300ms, with spikes up to 1s,

so this can explain the race.

(I can't explain the speed difference and spike)


another problem,

I also have hitted the bug again, and just after, I can't migrate the vm 
anymore,

the HA migrate task start, but after that, the migrate task don't occur.


pve-ha-crm log flood me in loop:

Oct 14 19:01:16 kvmtest1 pve-ha-crm[3819]: service 'vm:125': state changed from 
'migrate' to 'started'  (node = kvmtest2)
Oct 14 19:01:16 kvmtest1 pve-ha-crm[3819]: migrate service 'vm:125' to node 
'kvmtest1' (running)
Oct 14 19:01:16 kvmtest1 pve-ha-crm[3819]: service 'vm:125': state changed from 
'started' to 'migrate'  (node = kvmtest2, target = kvmtest1)
Oct 14 19:01:26 kvmtest1 pve-ha-crm[3819]: service 'vm:125' - migration failed 
(exit code 255)
Oct 14 19:01:26 kvmtest1 pve-ha-crm[3819]: service 'vm:125': state changed from 
'migrate' to 'started'  (node = kvmtest2)
Oct 14 19:01:26 kvmtest1 pve-ha-crm[3819]: migrate service 'vm:125' to node 
'kvmtest1' (running)
Oct 14 19:01:26 kvmtest1 pve-ha-crm[3819]: service 'vm:125': state changed from 
'started' to 'migrate'  (node = kvmtest2, target = kvmtest1)


Oct 14 19:04:33 kvmtest2 pve-ha-lrm[28430]: service 'vm:125' not on this node 
at /usr/share/perl5/PVE/HA/Env/PVE2.pm line 389.
Oct 14 19:04:43 kvmtest2 pve-ha-lrm[28451]: service 'vm:125' not on this node 
at /usr/share/perl5/PVE/HA/Env/PVE2.pm line 389.
Oct 14 19:04:53 kvmtest2 pve-ha-lrm[28472]: service 'vm:125' not on this node 
at /usr/share/perl5/PVE/HA/Env/PVE2.pm line 389.
Oct 14 19:05:03 kvmtest2 pve-ha-lrm[28493]: service 'vm:125' not on this node 
at /usr/share/perl5/PVE/HA/Env/PVE2.pm line 389.
Oct 14 19:05:13 kvmtest2 pve-ha-lrm[28520]: service 'vm:125' not on this node 
at /usr/share/perl5/PVE/HA/Env/PVE2.pm line 389.
Oct 14 19:05:23 kvmtest2 pve-ha-lrm[28541]: service 'vm:125' not on this node 
at /usr/share/perl5/PVE/HA/Env/PVE2.pm line 389.
Oct 14 19:05:33 kvmtest2 pve-ha-lrm[28562]: service 'vm:125' not on this node 
at /usr/share/perl5/PVE/HA/Env/PVE2.pm line 389.
Oct 14 19:05:43 kvmtest2 pve-ha-lrm[28583]: service 'vm:125' not on this node 
at /usr/share/perl5/PVE/HA/Env/PVE2.pm line 389.
Oct 14 19:05:53 kvmtest2 pve-ha-lrm[28604]: service 'vm:125' not on this node 
at /usr/share/perl5/PVE/HA/Env/PVE2.pm line 389.
Oct 14 19:06:03 kvmtest2 pve-ha-lrm[28626]: service 'vm:125' not on this node 
at /usr/share/perl5/PVE/HA/Env/PVE2.pm line 389.


----- Mail original -----
De: "aderumier" <aderum...@odiso.com>
À: "dietmar" <diet...@proxmox.com>
Cc: "pve-devel" <pve-devel@pve.proxmox.com>
Envoyé: Mercredi 14 Octobre 2015 16:17:24
Objet: Re: [pve-devel] [PATCH] migrate : add nocheck for resume

>>To be sure, I would also test with my direct_io patch for fuse... 
yes, I'm currently using it. 

I have make a simple perl script which monitor create/delete vm conf file, 
and time are indeed correct vs notify 


node1 
----- 

exist 20151014 16:14:06.183 
notexist20151014 16:14:38.989 
exist20151014 16:15:07.066 

node2 
----- 
notexist2 0151014 16:14:06.208 
exist 20151014 16:14:39.003 
notexist 20151014 16:15:07.089 


I'll try to reproduce the problem and compare time again 

----- Mail original ----- 
De: "dietmar" <diet...@proxmox.com> 
À: "aderumier" <aderum...@odiso.com> 
Cc: "pve-devel" <pve-devel@pve.proxmox.com> 
Envoyé: Mercredi 14 Octobre 2015 16:00:28 
Objet: Re: [pve-devel] [PATCH] migrate : add nocheck for resume 

> http://search.cpan.org/~andya/File-Monitor-1.00/lib/File/Monitor.pm 
> 
> which used stat() to detect changes 

_______________________________________________ 
pve-devel mailing list 
pve-devel@pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel 
_______________________________________________
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to