On Mon, Feb 14, 2022 at 03:14:37PM +0100, Markus Armbruster wrote:
> Cc: the qemu-ga maintainer
> 
> John Snow <js...@redhat.com> writes:
> 
> > [Moving our discussion upstream, because it stopped being brief and simple.]

Hi John, Markus,

> 
> Motivation: qemu-ga doesn't do capability negotiation as specified in
> docs/interop/qmp-spec.txt.
> 
> Reminder: qmp-spec.txt specifies the server shall send a greeting
> containing the capabilities on offer.  The client shall send a
> qmp_capabilities command before any other command.
> 
> We can't just fix qemu-ga to comply, because it would break existing
> clients.
> 
> We could document its behavior in qmp-spec.txt.  Easy enough, but also
> kind of sad.

I'm not sure we could've ever done it QMP-style with the initial
greeting/negotiation mode. It's been a while, I but recall virtio-serial
chardev in guest not having a very straight-forward way of flushing out
data from the vring after a new client connects on the host side, so
new clients had a chance of reading left-over garbage from previous
client sessions. Or maybe it was open/close/open on the guest/chardev
side that didn't cause the flush... anyway:

This is why guest-sync was there, so you could verify the stream was
in sync with a given "session ID" before continuing. But that doesn't
help much if the stream is in some garbage state that parser can't
recover from...

This is why guest-sync-delimited was introduced; it inserts a 0xFF
sential value (invalid for any normal QMP stream) prior to response that
a client can scan for to flush the stream. Similarly, clients are
supposed to precede guest-sync/guest-sync-delimited so QGA to get stuck
trying to parse a partial read from an earlier client that is 'eating' a
new request from a new client connection. I don't think these are really
issues with vsock (or the other transports QGA accepts), but AFAIK
Windows is still mostly reliant on virtio-serial, so these are probably
still needed.

So QGA has sort of always had its own hand-shake, ideally via
guest-sync-delimited. So if this new negotiation mechanism could
build off of that, rather than introducing something on top, that would
be ideal. Unfortunately it's naming isn't great for what's being done
here, but 'synchronize' is sorta in the ball-park at least...
> 
> Is there a way to add capability negotiation to qemu-ga without breaking
> existing clients?  We obviously have to make it optional.
> 
> The obvious idea "make qmp_capabilities optional" doesn't work, because
> the client needs to receive the greeting before sending
> qmp_capabilities, to learn what capabilities are on offer.
> 
> This leads to...
> 
> > What about something like this:
> >
> > Add a new "request-negotiation" command to qemu-guest-agent 7.0.0.
> >
> > [Modern client to unknown server]
> > 1. A modern client connects to a server of unknown version, and
> > without waiting, issues the "request-negotiation" command.
> > 2. An old server will reply with CommandNotFound. We are done negotiating.
> > 3. A modern server will reply with the greeting in the traditional
> > format, but as a reply object (to preserve "execute" semantics.)
> > 4. The modern client will now issue qmp-capabilities as normal.
> > 5. The server replies with success or failure as normal.
> > 6. Connection is fully established.
> >
> > [Old client to unknown server]
> > 1. An old client connects to an unknown version server.
> > 2. A command is issued some time later.
> >   2a. The server is old, the command worked as anticipated.
> >   2b. The server is new, the command fails with CommandNotFound and
> > urges the use of 'request-negotiation'.
> 
> A new server could accept the command, too.  This way, negotiation
> remains optional, unlike in "normal" QMP.  Old clients don't negotiate,
> and get default capabilities.

...so what if we had guest-sync/guest-sync-delimited start returning
capabilities and version fields rather than a new request-negotiation
command? If they aren't present we know the server is too old, and don't
have to rely on CommandNotFound to determine that.

If they are present, then qmp_capabilities can be issued to renegotiate
capabilities. (I agree with Markus on enabling default capabilities by
default as it's done now so backward-compatibility with older clients
is maintained, or maybe an explicit flag to QGA to require negotiation,
but only if there's a good use-case for requiring negotiation in some
cases)

> 
> > Compatibility matrix summary:
> > Old client on old server: Works just fine, as always.
> > Old client on new server: Will fail; the new server requires the
> > negotiation step to be performed. This is a tractable problem.
> > POSSIBLY we need to send some kind of "warning event" for two versions
> > before making it genuinely mandatory. Also tractable.
> 
> With optional negotiation, this works fine, too.
> 
> > New client on old server: Works, albeit with a single failed execute
> > command now in the log file.
> > New client on new server: Works, though handshaking is now permanently
> > a little chattier than with any other QMP server.
> >
> > ***The QMP spec will need to be updated*** to state: the asynchronous
> > greeting is mandatory on all QMP implementations, EXCEPT for the
> > qemu-guest-agent, which for historical reasons, uses an alternate
> > handshaking process, ...
> >
> > Compatibility concerns:
> > - We must never remove the 'request-negotiation' command from QGA,
> > forever-and-ever, unless we also make a new error class for
> > "NegotiationRequired" that's distinct from "CommandNotFound", but
> > that's more divergence. Supporting the negotiation request command
> > forever-and-ever is probably fine.
> 
> Yup.
> 
> > - QGA is now officially on a different flavor of QMP protocol. You
> > still need to know in advance if you are connecting to QGA or anything
> > else. That's still a little sad, but maybe that's just simply an
> > impossible goal.
> >
> > Bonus:
> > - If an execution ID is used when sending "request-negotiation", we
> > know that the server is at least version 4.0.0 if it responds to us
> > using that ID. A modern client can then easily distinguish between
> > pre-4.0, post-4.0 and post-7.0 servers. It's a useful probe.

Is that in reference to the optional 'id' field that can be passed to
QMP requests? Don't see the harm in that, and QGA should pass them back
intact.

> 
> Mike, thoughts?
> 

Reply via email to