Tested on iscsi pool, though there are no-cache requirement and rbd
with disabled cache may survive one migration but iscsi backend hangs
always. As it was before, just rolling back problematic commit fixes
the problem and adding cpu_synchronize_all_states to migration.c has
no difference at a glance in a VM` behavior. The problem consist at
least two separate ones: the current hang and behavior with the
unreverted patch from agraf - last one causes live migration with
writeback cache to fail, cache=none works well in any variant which
survives first condition. Marcin, would you mind to check the current
state of the problem on your environments in a spare time? It is
probably easier to reproduce on iscsi because of way smaller time
needed to set it up, command line and libvirt config attached
(v2.1.0-rc2 plus iscsi-1.11.0).
Ok, but what exacly do you want me to test?
Just to avoid any confusion, originally there were two problems with
kvmclock:
1. Commit a096b3a6732f846ec57dc28b47ee9435aa0609bf fixes problem when
clock drift (?) caused kvmclock in guest to report time in past which
caused guest kernel to hang. This is hard to reproduce reliably
(probably as it requires long time for drift to accumulate).
2. Commit 9b1786829aefb83f37a8f3135e3ea91c56001b56 fixes regression
caused by a096b3a6732f846ec57dc28b47ee9435aa0609bf which occured during
non-migration operations (drive-mirror + pivot), which also caused guest
kernel to hang. This is trival to reproduce.
I'm using both of them applied on top of 2.0 in production and have no
problems with them. I'm using NFS exclusively with cache=none.
So, I shall test vm-migration and drive-migration with 2.1.0-rc2 with no
extra patches applied or reverted, on VM that is running fio, am I correct?
--
mg