On Wed, Jul 16, 2014 at 3:52 PM, Marcelo Tosatti <mtosa...@redhat.com> wrote: > On Wed, Jul 16, 2014 at 12:38:51PM +0400, Andrey Korolyov wrote: >> On Wed, Jul 16, 2014 at 5:16 AM, Marcelo Tosatti <mtosa...@redhat.com> wrote: >> > On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote: >> >> On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini <pbonz...@redhat.com> >> >> wrote: >> >> > Il 15/07/2014 23:25, Andrey Korolyov ha scritto: >> >> > >> >> >> On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti <mtosa...@redhat.com> >> >> >> wrote: >> >> >>> >> >> >>> On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote: >> >> >>>> >> >> >>>> On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov <and...@xdel.ru> >> >> >>>> wrote: >> >> >>>>> >> >> >>>>> On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah <amit.s...@redhat.com> >> >> >>>>> wrote: >> >> >>>>>> >> >> >>>>>> On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote: >> >> >>>>>>> >> >> >>>>>>> Hello, >> >> >>>>>>> >> >> >>>>>>> the issue is not specific to the iothread code because generic >> >> >>>>>>> virtio-blk also hangs up: >> >> >>>>>> >> >> >>>>>> >> >> >>>>>> Do you know which version works well? If you could bisect, that'll >> >> >>>>>> help a lot. >> >> >>>>>> >> >> >>>>>> Thanks, >> >> >>>>>> Amit >> >> >>>>> >> >> >>>>> >> >> >>>>> Hi, >> >> >>>>> >> >> >>>>> 2.0 works definitely well. I`ll try to finish bisection today, >> >> >>>>> though >> >> >>>>> every step takes about 10 minutes to complete. >> >> >>>> >> >> >>>> >> >> >>>> Yay! It is even outside of virtio-blk. >> >> >>>> >> >> >>>> commit 9b1786829aefb83f37a8f3135e3ea91c56001b56 >> >> >>>> Author: Marcelo Tosatti <mtosa...@redhat.com> >> >> >>>> Date: Tue Jun 3 13:34:48 2014 -0300 >> >> >>>> >> >> >>>> kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec >> >> >>>> calculation >> >> >>>> >> >> >>>> Ensure proper env->tsc value for kvmclock_current_nsec >> >> >>>> calculation. >> >> >>>> >> >> >>>> Reported-by: Marcin Gibuła <m.gib...@beyond.pl> >> >> >>>> Cc: qemu-sta...@nongnu.org >> >> >>>> Signed-off-by: Marcelo Tosatti <mtosa...@redhat.com> >> >> >>>> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com> >> >> >>> >> >> >>> >> >> >>> Andrey, >> >> >>> >> >> >>> Can you please provide instructions on how to create reproducible >> >> >>> environment? >> >> >>> >> >> >>> The following patch is equivalent to the original patch, for >> >> >>> the purposes of fixing the kvmclock problem. >> >> >>> >> >> >>> Perhaps it becomes easier to spot the reason for the hang you are >> >> >>> experiencing. >> >> >>> >> >> >>> >> >> >>> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c >> >> >>> index 272a88a..feb5fc5 100644 >> >> >>> --- a/hw/i386/kvm/clock.c >> >> >>> +++ b/hw/i386/kvm/clock.c >> >> >>> @@ -17,7 +17,6 @@ >> >> >>> #include "qemu/host-utils.h" >> >> >>> #include "sysemu/sysemu.h" >> >> >>> #include "sysemu/kvm.h" >> >> >>> -#include "sysemu/cpus.h" >> >> >>> #include "hw/sysbus.h" >> >> >>> #include "hw/kvm/clock.h" >> >> >>> >> >> >>> @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState >> >> >>> *s) >> >> >>> >> >> >>> cpu_physical_memory_read(kvmclock_struct_pa, &time, >> >> >>> sizeof(time)); >> >> >>> >> >> >>> - assert(time.tsc_timestamp <= migration_tsc); >> >> >>> delta = migration_tsc - time.tsc_timestamp; >> >> >>> if (time.tsc_shift < 0) { >> >> >>> delta >>= -time.tsc_shift; >> >> >>> @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque, >> >> >>> int running, >> >> >>> if (s->clock_valid) { >> >> >>> return; >> >> >>> } >> >> >>> - >> >> >>> - cpu_synchronize_all_states(); >> >> >>> ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data); >> >> >>> if (ret < 0) { >> >> >>> fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", >> >> >>> strerror(ret)); >> >> >>> diff --git a/migration.c b/migration.c >> >> >>> index 8d675b3..34f2325 100644 >> >> >>> --- a/migration.c >> >> >>> +++ b/migration.c >> >> >>> @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque) >> >> >>> qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER); >> >> >>> old_vm_running = runstate_is_running(); >> >> >>> >> >> >>> + cpu_synchronize_all_states(); >> >> >>> ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); >> >> >>> if (ret >= 0) { >> >> >>> qemu_file_set_rate_limit(s->file, INT64_MAX); >> >> > >> >> > >> >> > It could also be useful to apply the above patch _and_ revert >> >> > a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce. >> >> > >> >> > Paolo >> >> >> >> Yes, it solved the issue for me! (it took much time to check because >> >> most of country` debian mirrors went inconsistent by some reason) >> >> >> >> Also trivial addition: >> >> >> >> diff --git a/migration.c b/migration.c >> >> index 34f2325..65d1c88 100644 >> >> --- a/migration.c >> >> +++ b/migration.c >> >> @@ -25,6 +25,7 @@ >> >> #include "qemu/thread.h" >> >> #include "qmp-commands.h" >> >> #include "trace.h" >> >> +#include "sysemu/cpus.h" >> > >> > And what about not reverting a096b3a6732f846ec57dc28b47ee9435aa0609bf ? >> > >> > That is, test with a stock qemu.git tree and the patch sent today, >> > on this thread, to move cpu_synchronize_all_states ? >> > >> > >> >> The main reason for things to work for me is a revert of >> 9b1786829aefb83f37a8f3135e3ea91c56001b56 on top, not adding any other >> patches. I had tested two cases, with Alexander`s patch completely >> reverted plus suggestion from Marcelo and only with revert 9b178682 >> plug same suggestion. The difference is that the until Alexander` >> patch is not reverted, live migration is always failing by the timeout >> value, and when reverted migration always succeeds in 8-10 seconds. >> Appropriate diffs are attached for the reference. > > Andrey, > > Can you please apply only the following attached patch to an upstream > QEMU git tree (move_synchronize_all_states.patch), plus the necessary > header file corrections, and attempt to reproduce? > > When you reproduce, please provide a backtrace and version of the QEMU > git tree, and instructions on how to reproduce: > > 1) full QEMU command line > 2) steps to reproduce > >
Marcelo, as I can see, this patch resembles second case from my previous message exactly (there are diffs from the generic rc). I/O is not locking up there but live migration failing and libvirt moves a freezed state. I can try to run the same on top of rc2, but it`ll be probably the same.