I have made test, with a loop of move file each second, and monitor the time between source and target.
the results are between 10ms and 300ms, with spikes up to 1s, so this can explain the race. (I can't explain the speed difference and spike) another problem, I also have hitted the bug again, and just after, I can't migrate the vm anymore, the HA migrate task start, but after that, the migrate task don't occur. pve-ha-crm log flood me in loop: Oct 14 19:01:16 kvmtest1 pve-ha-crm[3819]: service 'vm:125': state changed from 'migrate' to 'started' (node = kvmtest2) Oct 14 19:01:16 kvmtest1 pve-ha-crm[3819]: migrate service 'vm:125' to node 'kvmtest1' (running) Oct 14 19:01:16 kvmtest1 pve-ha-crm[3819]: service 'vm:125': state changed from 'started' to 'migrate' (node = kvmtest2, target = kvmtest1) Oct 14 19:01:26 kvmtest1 pve-ha-crm[3819]: service 'vm:125' - migration failed (exit code 255) Oct 14 19:01:26 kvmtest1 pve-ha-crm[3819]: service 'vm:125': state changed from 'migrate' to 'started' (node = kvmtest2) Oct 14 19:01:26 kvmtest1 pve-ha-crm[3819]: migrate service 'vm:125' to node 'kvmtest1' (running) Oct 14 19:01:26 kvmtest1 pve-ha-crm[3819]: service 'vm:125': state changed from 'started' to 'migrate' (node = kvmtest2, target = kvmtest1) Oct 14 19:04:33 kvmtest2 pve-ha-lrm[28430]: service 'vm:125' not on this node at /usr/share/perl5/PVE/HA/Env/PVE2.pm line 389. Oct 14 19:04:43 kvmtest2 pve-ha-lrm[28451]: service 'vm:125' not on this node at /usr/share/perl5/PVE/HA/Env/PVE2.pm line 389. Oct 14 19:04:53 kvmtest2 pve-ha-lrm[28472]: service 'vm:125' not on this node at /usr/share/perl5/PVE/HA/Env/PVE2.pm line 389. Oct 14 19:05:03 kvmtest2 pve-ha-lrm[28493]: service 'vm:125' not on this node at /usr/share/perl5/PVE/HA/Env/PVE2.pm line 389. Oct 14 19:05:13 kvmtest2 pve-ha-lrm[28520]: service 'vm:125' not on this node at /usr/share/perl5/PVE/HA/Env/PVE2.pm line 389. Oct 14 19:05:23 kvmtest2 pve-ha-lrm[28541]: service 'vm:125' not on this node at /usr/share/perl5/PVE/HA/Env/PVE2.pm line 389. Oct 14 19:05:33 kvmtest2 pve-ha-lrm[28562]: service 'vm:125' not on this node at /usr/share/perl5/PVE/HA/Env/PVE2.pm line 389. Oct 14 19:05:43 kvmtest2 pve-ha-lrm[28583]: service 'vm:125' not on this node at /usr/share/perl5/PVE/HA/Env/PVE2.pm line 389. Oct 14 19:05:53 kvmtest2 pve-ha-lrm[28604]: service 'vm:125' not on this node at /usr/share/perl5/PVE/HA/Env/PVE2.pm line 389. Oct 14 19:06:03 kvmtest2 pve-ha-lrm[28626]: service 'vm:125' not on this node at /usr/share/perl5/PVE/HA/Env/PVE2.pm line 389. ----- Mail original ----- De: "aderumier" <aderum...@odiso.com> À: "dietmar" <diet...@proxmox.com> Cc: "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Mercredi 14 Octobre 2015 16:17:24 Objet: Re: [pve-devel] [PATCH] migrate : add nocheck for resume >>To be sure, I would also test with my direct_io patch for fuse... yes, I'm currently using it. I have make a simple perl script which monitor create/delete vm conf file, and time are indeed correct vs notify node1 ----- exist 20151014 16:14:06.183 notexist20151014 16:14:38.989 exist20151014 16:15:07.066 node2 ----- notexist2 0151014 16:14:06.208 exist 20151014 16:14:39.003 notexist 20151014 16:15:07.089 I'll try to reproduce the problem and compare time again ----- Mail original ----- De: "dietmar" <diet...@proxmox.com> À: "aderumier" <aderum...@odiso.com> Cc: "pve-devel" <pve-devel@pve.proxmox.com> Envoyé: Mercredi 14 Octobre 2015 16:00:28 Objet: Re: [pve-devel] [PATCH] migrate : add nocheck for resume > http://search.cpan.org/~andya/File-Monitor-1.00/lib/File/Monitor.pm > > which used stat() to detect changes _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel