Anthony Liguori <anth...@codemonkey.ws> wrote: > On 11/24/2010 09:16 AM, Paolo Bonzini wrote: >> On 11/24/2010 12:14 PM, Michael S. Tsirkin wrote: >>>> > buffered_file timer runs each 100ms. And we "try" to measure >>>> channel >>>> > bandwidth from there. If we are not able to run the timer, all the >>>> > calculations are wrong, and then stalls happens. >>> >>> So the problem is the timer in the buffered file abstraction? >>> Why don't we just flush out data if the buffer is full? >> >> It takes a lot to fill the buffer if you have many zero pages, and >> if that happens the guest is starved by the main loop filling the >> buffer. > > Sounds like the sort of thing you'd only see if you created a guest a > large guest that was mostly unallocated and then tried to migrate. > That doesn't seem like a very real case to me though.
No. this is the "easy" to reproduce case. You can get that in normal use. Just with an idle guest with loads of memory is the worst possible case, and trivial to reproduce. > The best approach would be to drop qemu_mutex while processing this > code instead of having an arbitrary back-off point. The later is > deferring the problem to another day when it becomes the source of a > future problem. As told in the other mail, you are offering me half a solution. If I implemente the qemu_mutex change (that I will do) we would still have this problem on the main loop. CPU stuck for 10s will be done, but nothing else. Later, Juan. > Regards, > > Anthony Liguori > >> Paolo >>