On Wed, Jul 16, 2014 at 3:52 PM, Marcelo Tosatti <mtosa...@redhat.com> wrote:
> On Wed, Jul 16, 2014 at 12:38:51PM +0400, Andrey Korolyov wrote:
>> On Wed, Jul 16, 2014 at 5:16 AM, Marcelo Tosatti <mtosa...@redhat.com> wrote:
>> > On Wed, Jul 16, 2014 at 03:40:47AM +0400, Andrey Korolyov wrote:
>> >> On Wed, Jul 16, 2014 at 2:01 AM, Paolo Bonzini <pbonz...@redhat.com> 
>> >> wrote:
>> >> > Il 15/07/2014 23:25, Andrey Korolyov ha scritto:
>> >> >
>> >> >> On Wed, Jul 16, 2014 at 1:09 AM, Marcelo Tosatti <mtosa...@redhat.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> On Tue, Jul 15, 2014 at 06:01:08PM +0400, Andrey Korolyov wrote:
>> >> >>>>
>> >> >>>> On Tue, Jul 15, 2014 at 10:52 AM, Andrey Korolyov <and...@xdel.ru>
>> >> >>>> wrote:
>> >> >>>>>
>> >> >>>>> On Tue, Jul 15, 2014 at 9:03 AM, Amit Shah <amit.s...@redhat.com>
>> >> >>>>> wrote:
>> >> >>>>>>
>> >> >>>>>> On (Sun) 13 Jul 2014 [16:28:56], Andrey Korolyov wrote:
>> >> >>>>>>>
>> >> >>>>>>> Hello,
>> >> >>>>>>>
>> >> >>>>>>> the issue is not specific to the iothread code because generic
>> >> >>>>>>> virtio-blk also hangs up:
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>> Do you know which version works well?  If you could bisect, that'll
>> >> >>>>>> help a lot.
>> >> >>>>>>
>> >> >>>>>> Thanks,
>> >> >>>>>>                 Amit
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> Hi,
>> >> >>>>>
>> >> >>>>> 2.0 works definitely well. I`ll try to finish bisection today, 
>> >> >>>>> though
>> >> >>>>> every step takes about 10 minutes to complete.
>> >> >>>>
>> >> >>>>
>> >> >>>> Yay! It is even outside of virtio-blk.
>> >> >>>>
>> >> >>>> commit 9b1786829aefb83f37a8f3135e3ea91c56001b56
>> >> >>>> Author: Marcelo Tosatti <mtosa...@redhat.com>
>> >> >>>> Date:   Tue Jun 3 13:34:48 2014 -0300
>> >> >>>>
>> >> >>>>     kvmclock: Ensure proper env->tsc value for kvmclock_current_nsec
>> >> >>>> calculation
>> >> >>>>
>> >> >>>>     Ensure proper env->tsc value for kvmclock_current_nsec 
>> >> >>>> calculation.
>> >> >>>>
>> >> >>>>     Reported-by: Marcin Gibuła <m.gib...@beyond.pl>
>> >> >>>>     Cc: qemu-sta...@nongnu.org
>> >> >>>>     Signed-off-by: Marcelo Tosatti <mtosa...@redhat.com>
>> >> >>>>     Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
>> >> >>>
>> >> >>>
>> >> >>> Andrey,
>> >> >>>
>> >> >>> Can you please provide instructions on how to create reproducible
>> >> >>> environment?
>> >> >>>
>> >> >>> The following patch is equivalent to the original patch, for
>> >> >>> the purposes of fixing the kvmclock problem.
>> >> >>>
>> >> >>> Perhaps it becomes easier to spot the reason for the hang you are
>> >> >>> experiencing.
>> >> >>>
>> >> >>>
>> >> >>> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
>> >> >>> index 272a88a..feb5fc5 100644
>> >> >>> --- a/hw/i386/kvm/clock.c
>> >> >>> +++ b/hw/i386/kvm/clock.c
>> >> >>> @@ -17,7 +17,6 @@
>> >> >>>  #include "qemu/host-utils.h"
>> >> >>>  #include "sysemu/sysemu.h"
>> >> >>>  #include "sysemu/kvm.h"
>> >> >>> -#include "sysemu/cpus.h"
>> >> >>>  #include "hw/sysbus.h"
>> >> >>>  #include "hw/kvm/clock.h"
>> >> >>>
>> >> >>> @@ -66,7 +65,6 @@ static uint64_t kvmclock_current_nsec(KVMClockState 
>> >> >>> *s)
>> >> >>>
>> >> >>>      cpu_physical_memory_read(kvmclock_struct_pa, &time, 
>> >> >>> sizeof(time));
>> >> >>>
>> >> >>> -    assert(time.tsc_timestamp <= migration_tsc);
>> >> >>>      delta = migration_tsc - time.tsc_timestamp;
>> >> >>>      if (time.tsc_shift < 0) {
>> >> >>>          delta >>= -time.tsc_shift;
>> >> >>> @@ -125,8 +123,6 @@ static void kvmclock_vm_state_change(void *opaque,
>> >> >>> int running,
>> >> >>>          if (s->clock_valid) {
>> >> >>>              return;
>> >> >>>          }
>> >> >>> -
>> >> >>> -        cpu_synchronize_all_states();
>> >> >>>          ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
>> >> >>>          if (ret < 0) {
>> >> >>>              fprintf(stderr, "KVM_GET_CLOCK failed: %s\n",
>> >> >>> strerror(ret));
>> >> >>> diff --git a/migration.c b/migration.c
>> >> >>> index 8d675b3..34f2325 100644
>> >> >>> --- a/migration.c
>> >> >>> +++ b/migration.c
>> >> >>> @@ -608,6 +608,7 @@ static void *migration_thread(void *opaque)
>> >> >>>                  qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER);
>> >> >>>                  old_vm_running = runstate_is_running();
>> >> >>>
>> >> >>> +                cpu_synchronize_all_states();
>> >> >>>                  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>> >> >>>                  if (ret >= 0) {
>> >> >>>                      qemu_file_set_rate_limit(s->file, INT64_MAX);
>> >> >
>> >> >
>> >> > It could also be useful to apply the above patch _and_ revert
>> >> > a096b3a6732f846ec57dc28b47ee9435aa0609bf, then try to reproduce.
>> >> >
>> >> > Paolo
>> >>
>> >> Yes, it solved the issue for me! (it took much time to check because
>> >> most of country` debian mirrors went inconsistent by some reason)
>> >>
>> >> Also trivial addition:
>> >>
>> >> diff --git a/migration.c b/migration.c
>> >> index 34f2325..65d1c88 100644
>> >> --- a/migration.c
>> >> +++ b/migration.c
>> >> @@ -25,6 +25,7 @@
>> >>  #include "qemu/thread.h"
>> >>  #include "qmp-commands.h"
>> >>  #include "trace.h"
>> >> +#include "sysemu/cpus.h"
>> >
>> > And what about not reverting a096b3a6732f846ec57dc28b47ee9435aa0609bf ?
>> >
>> > That is, test with a stock qemu.git tree and the patch sent today,
>> > on this thread, to move cpu_synchronize_all_states ?
>> >
>> >
>>
>> The main reason for things to work for me is a revert of
>> 9b1786829aefb83f37a8f3135e3ea91c56001b56 on top, not adding any other
>> patches. I had tested two cases, with Alexander`s patch completely
>> reverted plus suggestion from Marcelo and only with revert 9b178682
>> plug same suggestion. The difference is that the until Alexander`
>> patch is not reverted, live migration is always failing by the timeout
>> value, and when reverted migration always succeeds in 8-10 seconds.
>> Appropriate diffs are attached for the reference.
>
> Andrey,
>
> Can you please apply only the following attached patch to an upstream
> QEMU git tree (move_synchronize_all_states.patch), plus the necessary
> header file corrections, and attempt to reproduce?
>
> When you reproduce, please provide a backtrace and version of the QEMU
> git tree, and instructions on how to reproduce:
>
> 1) full QEMU command line
> 2) steps to reproduce
>
>

Marcelo, as I can see, this patch resembles second case from my
previous message exactly (there are diffs from the generic rc). I/O is
not locking up there but live migration failing and libvirt moves a
freezed state. I can try to run the same on top of rc2, but it`ll be
probably the same.

Reply via email to