On Wed, Feb 16, 2022 at 10:12:36AM +0100, Markus Armbruster wrote: > Michael Roth <michael.r...@amd.com> writes: > > > On Mon, Feb 14, 2022 at 03:14:37PM +0100, Markus Armbruster wrote: > >> Cc: the qemu-ga maintainer > >> > >> John Snow <js...@redhat.com> writes: > >> > >> > [Moving our discussion upstream, because it stopped being brief and > >> > simple.] > > > > Hi John, Markus, > > > >> > >> Motivation: qemu-ga doesn't do capability negotiation as specified in > >> docs/interop/qmp-spec.txt. > >> > >> Reminder: qmp-spec.txt specifies the server shall send a greeting > >> containing the capabilities on offer. The client shall send a > >> qmp_capabilities command before any other command. > >> > >> We can't just fix qemu-ga to comply, because it would break existing > >> clients. > >> > >> We could document its behavior in qmp-spec.txt. Easy enough, but also > >> kind of sad. > > > > I'm not sure we could've ever done it QMP-style with the initial > > greeting/negotiation mode. It's been a while, I but recall virtio-serial > > chardev in guest not having a very straight-forward way of flushing out > > data from the vring after a new client connects on the host side, so > > new clients had a chance of reading left-over garbage from previous > > client sessions. Or maybe it was open/close/open on the guest/chardev > > side that didn't cause the flush... anyway: > > > > This is why guest-sync was there, so you could verify the stream was > > in sync with a given "session ID" before continuing. But that doesn't > > help much if the stream is in some garbage state that parser can't > > recover from... > > > > This is why guest-sync-delimited was introduced; it inserts a 0xFF > > sential value (invalid for any normal QMP stream) prior to response that > > a client can scan for to flush the stream. Similarly, clients are > > supposed to precede guest-sync/guest-sync-delimited so QGA to get stuck > > trying to parse a partial read from an earlier client that is 'eating' a > > new request from a new client connection. I don't think these are really > > issues with vsock (or the other transports QGA accepts), but AFAIK > > Windows is still mostly reliant on virtio-serial, so these are probably > > still needed. > > I believe you're right about the reason being virtio-serial. I > documented it that way in commit 72e9e569d0 "docs/interop/qmp-spec: How > to force known good parser state". > > 2.6 Forcing the JSON parser into known-good state > ------------------------------------------------- > > Incomplete or invalid input can leave the server's JSON parser in a > state where it can't parse additional commands. To get it back into > known-good state, the client should provoke a lexical error. > > The cleanest way to do that is sending an ASCII control character > other than '\t' (horizontal tab), '\r' (carriage return), or '\n' (new > line). > > Sadly, older versions of QEMU can fail to flag this as an error. If a > client needs to deal with them, it should send a 0xFF byte. > > 2.7 QGA Synchronization > ----------------------- > > When a client connects to QGA over a transport lacking proper > connection semantics such as virtio-serial, QGA may have read partial > input from a previous client. The client needs to force QGA's parser > into known-good state using the previous section's technique. > Moreover, the client may receive output a previous client didn't read. > To help with skipping that output, QGA provides the > 'guest-sync-delimited' command. Refer to its documentation for > details. > > 0xFF is invalid UTF-8, which is kind of icky. We should've used a > proper control character like EOT (end of transmission) from the start. > Water under the bridge. > > guest-sync has another design flaw: an unread command reply consisting > of just an integer can be confused with guest-sync's reply. Unlikely as > long as guest-sync's @id argument is chosen at random, as its > documentation demands. > > guest-sync could be deprecated, I guess.
Yes, should probably be deprecated in favor of guest-sync-delimited. I left it for clients that really don't want to dig into the transport layer to search for 0xFF, but still want at least some ability to re-sync. > > The @id argument of guest-sync and guest-sync-delimited feels kind of > redundant with the command object's @id member. Except QGA didn't > conform to the QMP spec until commit 4eaca8de26 "qmp: common 'id' > handling & make QGA conform to QMP spec" (v4.0.0). More water under the > bridge. > > Note that there's no need for all this when the transport provides > proper connection semantics. Clients relying on connection semantics > work fine even when they don't bother with this syncing stuff. Do such > clients exist? We probably don't know. May or may not matter. True, I think only virtio-serial and maybe isa-serial require the sync. I was hoping virtio-vsock might quickly become the default, then most clients would no longer need guest-sync*, but Windows still seems to be reliant on virtio-serial (AFAIK). > > > So QGA has sort of always had its own hand-shake, ideally via > > guest-sync-delimited. So if this new negotiation mechanism could > > build off of that, rather than introducing something on top, that would > > be ideal. Unfortunately it's naming isn't great for what's being done > > here, but 'synchronize' is sorta in the ball-park at least... > > Fair point. > > >> Is there a way to add capability negotiation to qemu-ga without breaking > >> existing clients? We obviously have to make it optional. > >> > >> The obvious idea "make qmp_capabilities optional" doesn't work, because > >> the client needs to receive the greeting before sending > >> qmp_capabilities, to learn what capabilities are on offer. > >> > >> This leads to... > >> > >> > What about something like this: > >> > > >> > Add a new "request-negotiation" command to qemu-guest-agent 7.0.0. > >> > > >> > [Modern client to unknown server] > >> > 1. A modern client connects to a server of unknown version, and > >> > without waiting, issues the "request-negotiation" command. > >> > 2. An old server will reply with CommandNotFound. We are done > >> > negotiating. > >> > 3. A modern server will reply with the greeting in the traditional > >> > format, but as a reply object (to preserve "execute" semantics.) > >> > 4. The modern client will now issue qmp-capabilities as normal. > >> > 5. The server replies with success or failure as normal. > >> > 6. Connection is fully established. > >> > > >> > [Old client to unknown server] > >> > 1. An old client connects to an unknown version server. > >> > 2. A command is issued some time later. > >> > 2a. The server is old, the command worked as anticipated. > >> > 2b. The server is new, the command fails with CommandNotFound and > >> > urges the use of 'request-negotiation'. > >> > >> A new server could accept the command, too. This way, negotiation > >> remains optional, unlike in "normal" QMP. Old clients don't negotiate, > >> and get default capabilities. > > > > ...so what if we had guest-sync/guest-sync-delimited start returning > > capabilities and version fields rather than a new request-negotiation > > command? If they aren't present we know the server is too old, and don't > > have to rely on CommandNotFound to determine that. > > Both guest-sync and guest-sync-delimited have a design flaw that defeats > such a straighforward extension: 'returns': 'int'. There is no way to > return more data compatibly. Ugh, I misread the scheme and thought it was a struct with a single field... I knew that seemed to good to be true. > > We could add an optional flag to guest-sync-delimited to make it return > an object. But I think we'd be better off with a new command. New cmd is probably for the best then, since hopefully guest-sync-delimited can become irrelevant in the future, and the possibility of a failed sync command on old clients (due to new param) getting mixed in with the recovery logic for a failed negotiation/capability probe, would probably make the whole interface a bit too unwieldy. > >> > - QGA is now officially on a different flavor of QMP protocol. You > >> > still need to know in advance if you are connecting to QGA or anything > >> > else. That's still a little sad, but maybe that's just simply an > >> > impossible goal. > >> > > >> > Bonus: > >> > - If an execution ID is used when sending "request-negotiation", we > >> > know that the server is at least version 4.0.0 if it responds to us > >> > using that ID. A modern client can then easily distinguish between > >> > pre-4.0, post-4.0 and post-7.0 servers. It's a useful probe. > > > > Is that in reference to the optional 'id' field that can be passed to > > QMP requests? Don't see the harm in that, and QGA should pass them back > > intact. > > I think it does since commit 4eaca8de26 "qmp: common 'id' handling & > make QGA conform to QMP spec" (v4.0.0). Ah, right, sort of remember now. Thanks for the pointer. -Mike