Denis Plotnikov <den-plotni...@yandex-team.ru> writes:

> Add "start" & "end" time values to QMP command responses.
>
> These time values are added to let the qemu management layer get the exact
> command execution time without any other time variance which might be brought
> by other parts of management layer or qemu internals.
> This helps to look for problems poactively from the management layer side.

proactively

> The management layer would be able to detect problem cases by calculating
> QMP command execution time:
> 1. execution_time_from_mgmt_perspective -
>        execution_time_of_qmp_command > some_threshold
>    This detects problems with management layer or internal qemu QMP command
>    dispatching
> 2. current_qmp_command_execution_time > avg_qmp_command_execution_time
>    This detects that a certain QMP command starts to execute longer than
>    usual
> In both these cases more thorough investigation of the root cases should be
> done by using some qemu tracepoints depending on particular QMP command under
> investigation or by other means. The timestamps help to avoid excessive log
> output when qemu tracepoints are used to address similar cases.
>
> Example of result:
>
>     ./qemu/scripts/qmp/qmp-shell /tmp/qmp.socket
>
>     (QEMU) query-status
>     {"end": {"seconds": 1650367305, "microseconds": 831032},
>      "start": {"seconds": 1650367305, "microseconds": 831012},
>      "return": {"status": "running", "singlestep": false, "running": true}}
>
> The response of the QMP command contains the start & end time of
> the QMP command processing.
>
> Also, "start" & "end" timestaps are added to qemu guest agent responses as
> qemu-ga shares the same code for request dispatching.
>
> Suggested-by: Andrey Ryabinin <a...@yandex-team.ru>
> Signed-off-by: Denis Plotnikov <den-plotni...@yandex-team.ru>
> Reviewed-by: Daniel P. Berrangé <berra...@redhat.com>

In review of v3, I asked you to address the following points in the
commit message:

1. Why tracepoints can't do the job

2. Why client-side logging can't do the job

3. Why of all the possible points of interest in a QMP command's flow
   through QEMU, you're picking these two.

If I remember correctly, 1. was discussed during review.  Okay, but you
still need to capture the argument here.

I don't think I've seen answers to 2. or 3. so far.

The management application already knows the time it sends a command and
the time it receives its respose.

The time delta consists of:

1. Network transmission from management application to QEMU

   Hopefully trivial for AF_UNIX, except when suspended.

2. Schedule callbacks from main loop, read from socket

3. Parse received text as JSON into QObject

   Continue to next step only when a complete JSON value is parsed.

4. Queue command (unless command is to be executed out-of-band)

   Takes global monitor lock.  Suspends the monitor when the queue is
   full.

5. Dequeue command (if in-band)

   Takes global monitor lock.  Resumes a suspended monitor.

6. Dispatch command

   Schedules a bottom half in the common case (in-band, not
   coroutine-capable).

7. Unmarshal command arguments from QObject into QAPI C types

8. Execute command handler

9. Marshal command return value from QAPI C type into QObject

10. Wait for bottom half (if used)

11. Marshal response QObject

12. Format response QObject as JSON

13. Queue response text

14. Write out queue to the socket

15. Network transmission from QEMU to management application

Client-side logging can give us time of the first and the last step.

The timestamps you propose are at the beginning of step 6 and the end of
step 11.  They exclude network transmission, some of the marshaling /
unmarshalling, and some of the internal thread synchronization and
scheduling.  Why pick these points?  Why not, for instance, beginning
and end of step 8?  I'm not proposing to do that instead, I'm asking for
rationale!


Reply via email to