Re: [Qemu-block] [Qemu-devel] [v2 0/4] Fix long vm downtime during live migration

2015-10-21 Thread Li, Liang Z
> > Some cleanup operations take long time during the pause and copy 
> > stage, especially with the KVM patch 3ea3b7fa9af067, do these 
> > operations after the completion of live migration can help to reduce 
> > VM
> downtime.
> >
> > Ony the first patch changes the behavior, the rest 3 patches are for 
> > code cleanup.
> >
> > Changes:
> >   * Remove qemu_savevm_sate_cancel() in migrate_fd_cleanup()
> >   * Add 2 more patches for code clean up
> 
> Reviewed-by: Paolo Bonzini 

Resend this mail.

Hi Juan & Amit,

Could you help to review this serial of patches  and give some comments when 
you have time? 

The link to the thread is: 
https://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg01516.html

Liang



Re: [Qemu-block] [PATCH 1/2] migration: do cleanup operation after completion

2015-08-12 Thread Li, Liang Z
 
 On 12/08/2015 23:04, Liang Li wrote:
  @@ -1008,8 +1009,10 @@ static void *migration_thread(void *opaque)
   }
 
   qemu_mutex_lock_iothread();
  +end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
  +qemu_savevm_state_cancel();
  +
 
 You can remove the qemu_savevm_state_cancel() call from
 migrate_fd_cleanup, too.  Probably best to post a v2 with that change as well.
 
 Paolo

You are right. Done.

Liang



Re: [Qemu-block] [PATCH] migration: flush the bdrv before stopping VM

2015-06-24 Thread Li, Liang Z
 Right now, we don't have an interface to detect that cases and
 got back to the iterative stage.
   
How about go back to the iterative stage when detect that the
pending_size is larger Than max_size, like this:
   
+/* do flush here is aimed to shorten the VM downtime,
+ * bdrv_flush_all is a time consuming operation
+ * when the guest has done some file writing */
+bdrv_flush_all();
+pending_size = qemu_savevm_state_pending(s-file,
 max_size);
+if (pending_size  pending_size = max_size) {
+qemu_mutex_unlock_iothread();
+continue;
+}
  ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
  if (ret = 0) {
  qemu_file_set_rate_limit(s-file,
INT64_MAX);
   
and this is quite simple.
   
Yes, but it is too simple. If you hold all the locks during
bdrv_flush_all(), your VM will effectively stop as soon as it
performs the next I/O access, so you don't win much. And you
still don't have a timeout for cases where the flush takes really long.
  
   This is probably better than what we had now (basically we are
 meassuring
   after bdrv_flush_all how much the amount of dirty memory has
   changed, and return to iterative stage if it took too much.  A
   timeout would be better anyways.  And an interface te start the
   synchronization sooner asynchronously would be also good.
  
   Notice that my understanding is that any proper fix for this is 2.4
 material.
  
   Then, how to deal with this issue in 2.3, leave it here? or make an
   incomplete fix like I do above?
 
  I think it is better to leave it here for 2.3. With a patch like this
  one, we improve in one load and we got worse in a different load
  (depens a lot in the ratio of dirtying memory vs disk).  I have no
  data which load is more common, so I prefer to be conservative so late
  in the cycle.  What do you think?
 
 I agree, it's too late in the release cycle for such a change.
 
 Kevin

Hi Juan  Kevin,

I have not found the related patches to fix the issue which lead to long VM 
downtime,  how is it going?

Liang