On Thu, Oct 26, 2023 at 01:03:57PM -0400, Peter Xu wrote: > On Thu, Oct 26, 2023 at 05:06:37PM +0100, Joao Martins wrote: > > On 26/10/2023 16:53, Peter Xu wrote: > > > This small series (actually only the last patch; first two are cleanups) > > > wants to improve ability of QEMU downtime analysis similarly to what Joao > > > used to propose here: > > > > > > > > > https://lore.kernel.org/r/20230926161841.98464-1-joao.m.mart...@oracle.com > > > > > Thanks for following up on the idea; It's been hard to have enough > > bandwidth for > > everything on the past set of weeks :( > > Yeah, totally understdood. I think our QE team pushed me towards some > series like this, while my plan was waiting for your new version. :) > > Then when I started I decided to go into per-device. I was thinking of > also persist that information, but then I remembered some ppc guest can > have ~40,000 vmstates.. and memory to maintain that may or may not regress > a ppc user. So I figured I should first keep it simple with tracepoints. > > > > > > But with a few differences: > > > > > > - Nothing exported yet to qapi, all tracepoints so far > > > > > > - Instead of major checkpoints (stop, iterable, non-iterable, > > > resume-rp), > > > finer granule by providing downtime measurements for each vmstate (I > > > made microsecond to be the unit to be accurate). So far it seems > > > iterable / non-iterable is the core of the problem, and I want to nail > > > it to per-device. > > > > > > - Trace dest QEMU too > > > > > > For the last bullet: consider the case where a device save() can be super > > > fast, while load() can actually be super slow. Both of them will > > > contribute to the ultimate downtime, but not a simple summary: when src > > > QEMU is save()ing on device1, dst QEMU can be load()ing on device2. So > > > they can run in parallel. However the only way to figure all components > > > of > > > the downtime is to record both. > > > > > > Please have a look, thanks. > > > > > > > I like your series, as it allows a user to pinpoint one particular bad > > device, > > while covering the load side too. The checkpoints of migration on the other > > hand > > were useful -- while also a bit ugly -- for the sort of big picture of how > > downtime breaks down. Perhaps we could add that /also/ as tracepoitns > > without > > specifically commiting to be exposed in QAPI. > > > > More fundamentally, how can one capture the 'stop' part? There's also time > > spent > > there like e.g. quiescing/stopping vhost-net workers, or suspending the VF > > device. All likely as bad to those tracepoints pertaining device-state/ram > > related stuff (iterable and non-iterable portions). > > Yeah that's a good point. I didn't cover "stop" yet because I think it's > just more tricky and I didn't think it all through, yet. > > The first question is, when stopping some backends, the vCPUs are still > running, so it's not 100% clear to me on which should be contributed as > part of real downtime.
I was wrong.. we always stop vcpus first. If you won't mind, I can add some traceopints for all those spots in this series to cover your other series. I'll also make sure I do that for both sides. Thanks, > > Meanwhile that'll be another angle besides vmstates: need to keep some eye > on the state change handlers, and that can be a device, or something else. > > Did you measure the stop process in some way before? Do you have some > rough number or anything surprising you already observed? > > Thanks, > > -- > Peter Xu -- Peter Xu