> > >> >> > Right now, we don't have an interface to detect that cases and > > >> >> > got back to the iterative stage. > > >> >> > > >> >> How about go back to the iterative stage when detect that the > > >> >> pending_size is larger Than max_size, like this: > > >> >> > > >> >> + /* do flush here is aimed to shorten the VM downtime, > > >> >> + * bdrv_flush_all is a time consuming operation > > >> >> + * when the guest has done some file writing */ > > >> >> + bdrv_flush_all(); > > >> >> + pending_size = qemu_savevm_state_pending(s->file, > max_size); > > >> >> + if (pending_size && pending_size >= max_size) { > > >> >> + qemu_mutex_unlock_iothread(); > > >> >> + continue; > > >> >> + } > > >> >> ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); > > >> >> if (ret >= 0) { > > >> >> qemu_file_set_rate_limit(s->file, > > >> >> INT64_MAX); > > >> >> > > >> >> and this is quite simple. > > >> > > > >> > Yes, but it is too simple. If you hold all the locks during > > >> > bdrv_flush_all(), your VM will effectively stop as soon as it > > >> > performs the next I/O access, so you don't win much. And you > > >> > still don't have a timeout for cases where the flush takes really long. > > >> > > >> This is probably better than what we had now (basically we are > "meassuring" > > >> after bdrv_flush_all how much the amount of dirty memory has > > >> changed, and return to iterative stage if it took too much. A > > >> timeout would be better anyways. And an interface te start the > > >> synchronization sooner asynchronously would be also good. > > >> > > >> Notice that my understanding is that any proper fix for this is 2.4 > material. > > > > > > Then, how to deal with this issue in 2.3, leave it here? or make an > > > incomplete fix like I do above? > > > > I think it is better to leave it here for 2.3. With a patch like this > > one, we improve in one load and we got worse in a different load > > (depens a lot in the ratio of dirtying memory vs disk). I have no > > data which load is more common, so I prefer to be conservative so late > > in the cycle. What do you think? > > I agree, it's too late in the release cycle for such a change. > > Kevin
Hi Juan & Kevin, I have not found the related patches to fix the issue which lead to long VM downtime, how is it going? Liang