On Fri, Oct 27, 2023 at 09:58:03AM +0100, Joao Martins wrote:
> On 26/10/2023 21:07, Peter Xu wrote:
> > On Thu, Oct 26, 2023 at 08:33:13PM +0100, Joao Martins wrote:
> >> Sure. For the fourth patch, feel free to add Suggested-by and/or a Link,
> >> considering it started on the other patches (if you also agree it is 
> >> right). The
> >> patches ofc are enterily different, but at least I like to believe the 
> >> ideas
> >> initially presented and then subsequently improved are what lead to the 
> >> downtime
> >> observability improvements in this series.
> > 
> > Sure, I'll add that.
> > 
> > If you like, I would be definitely happy to have Co-developed-by: with you,
> > if you agree. 
> 
> Oh, that's great, thanks!

Great!  I apologize on not asking already before a formal patch is post.

> 
> > I just don't know whether that addressed all your need, and
> > I need some patch like that for our builds.
> 
> I think it achieves the same as the other series. Or rather it re-implements 
> it
> but with less compromise on QAPI and made the tracepoints more 'generic' to 
> even
> other usecases and less specific to the 'checkpoint breakdown'. Which makes 
> the
> implementation simpler (like we don't need that array storing the checkpoint
> timestamps) given that it's just tracing and not for QAPI.

Yes.  Please also feel free to have a closer look on the exact checkpoints
in that patch.  I just want to make sure that'll be able to service the
same as the patch you proposed, but with tracepoints, and I don't miss
anything important.

The dest checkpoints are all new, I hope I nailed them all right as we
would expect.

For src checkpoints, IIRC your patch explicitly tracked return path closing
while patch 4 only made it just part of final enclosure; the 4th checkpoint
is after non-iterable sent, until 5th to be the last "downtime-end". It can
cover more than "return path close":

    qemu_savevm_state_complete_precopy_non_iterable <--- 4th checkpoint
    qemu_fflush (after non-iterable sent)
    close_return_path_on_source
    cpu_throttle_stop
    qemu_mutex_lock_iothread
    migration_downtime_end                          <---- 5th checkpoint

If you see fit or necessary, I can, for example, also add one more
checkpoint right after close return path.  I just didn't know whether it's
your intention to explicitly check that point.  Just let me know if so.

Also on whether you prefer to keep a timestamp in the tracepoint itself;
I only use either "log" or "dtrace"+qemu-trace-stap for tracing: both of
them contain timestamps already.  But I can also add the timestamps
(offseted by downtime_start) if you prefer.

I plan to repost before early next week (want to catch the train for 8.2,
if it works out), so it'll be great to collect all your feedback and amend
that before the repost.

> 
> Though while it puts more work over developing new tracing tooling for users, 
> I
> think it's a good start towards downtime breakdown "clearity" without trading
> off maintenance.

Thanks,

-- 
Peter Xu


Reply via email to