Re: A brief look at deprecating our JSON extensions over RFC 8259
Peter Maydell writes: > On Tue, 23 Feb 2021 at 09:33, Markus Armbruster wrote: >> Misunderstanding: our JSON interpolation feature is *not* string >> interpolation! It interpolates *objects* into the QObject built by the >> parser. > > Given that it's basically undocumented except in a scattered > handful of comments inside the qjson parser implementation, it's > not too surprising that people misunderstand it :-) Yes, that's fair. I added a fair amount of commentary, but it's heavily geared towards maintainers, not users. > (One surprising > feature: the parser takes ownership of the object that you pass it > via the '%p' interpolation, and will qobject_unref() it.) Yes, %p takes over the reference.
Re: A brief look at deprecating our JSON extensions over RFC 8259
Paolo Bonzini writes: > On 23/02/21 10:06, Markus Armbruster wrote: >>> Markus, did you rebuild the qtests after disabling single-quoted >>> strings? "make check-qtest-x86_64" would have rebuilt them, but I'm >>> confused by the results. >> I ran "make check" and looked at the failures: >> Still confused? > > Yes. What's the patch that you used to disable the single quotes, and > why didn't the patched parser choke on > > response = qmp("{ 'execute': 'qom-get', 'arguments': { 'path': %s, " >"'property': 'temperature' } }", id); > > ? My bad! I neglected to mention that I tied "disable single-quoted strings" to "interpolation is off" for my experiment. This is a lazy, half-assed approximation of "internal interface". Here's the experimental patch. commit 57138b9d4188dd8ce1814237fe53f7131bbb3f45 Author: Markus Armbruster Date: Mon Feb 22 17:04:10 2021 +0100 qobject: Tie single quote extension to interpolation WIP This makes several tests fail or hang. Some are hacked up. diff --git a/qobject/json-parser.c b/qobject/json-parser.c index 008b326fb8..c1ddc65d96 100644 --- a/qobject/json-parser.c +++ b/qobject/json-parser.c @@ -150,9 +150,6 @@ static QString *parse_string(JSONParserContext *ctxt, JSONToken *token) case '"': g_string_append_c(str, '"'); break; -case '\'': -g_string_append_c(str, '\''); -break; case '\\': g_string_append_c(str, '\\'); break; @@ -201,6 +198,12 @@ static QString *parse_string(JSONParserContext *ctxt, JSONToken *token) } g_string_append(str, utf8_buf); break; +case '\'': +if (ctxt->ap) { +g_string_append_c(str, '\''); +break; +} +/* fall through */ default: parse_error(ctxt, token, "invalid escape sequence in string"); goto out; diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c index b93d97b995..3d4d3b484e 100644 --- a/qobject/json-streamer.c +++ b/qobject/json-streamer.c @@ -49,6 +49,11 @@ void json_message_process_token(JSONLexer *lexer, GString *input, case JSON_RSQUARE: parser->bracket_count--; break; +case JSON_STRING: +if (input->str[0] == '\"' || parser->ap) { +break; +} +/* fall through */ case JSON_ERROR: error_setg(&err, "JSON parse error, stray '%s'", input->str); goto out_emit;
Re: A brief look at deprecating our JSON extensions over RFC 8259
On Tue, 23 Feb 2021 at 09:33, Markus Armbruster wrote: > Misunderstanding: our JSON interpolation feature is *not* string > interpolation! It interpolates *objects* into the QObject built by the > parser. Given that it's basically undocumented except in a scattered handful of comments inside the qjson parser implementation, it's not too surprising that people misunderstand it :-) (One surprising feature: the parser takes ownership of the object that you pass it via the '%p' interpolation, and will qobject_unref() it.) -- PMM
Re: A brief look at deprecating our JSON extensions over RFC 8259
On 23/02/21 10:06, Markus Armbruster wrote: Markus, did you rebuild the qtests after disabling single-quoted strings? "make check-qtest-x86_64" would have rebuilt them, but I'm confused by the results. I ran "make check" and looked at the failures: Still confused? Yes. What's the patch that you used to disable the single quotes, and why didn't the patched parser choke on response = qmp("{ 'execute': 'qom-get', 'arguments': { 'path': %s, " "'property': 'temperature' } }", id); ? Paolo
Re: A brief look at deprecating our JSON extensions over RFC 8259
Markus Armbruster writes: [...] > A bigger stumbling block for replacement is our need for a streaming > interface: we feed the parser characters, and expect to be called back > when an expression is complete. Another stumbling block: check-qjson.c test case "/literals/string/utf8" and "/literals/string/escaped". Off-the-shelf parsers flunking them would surprise me about as much as the sun going up in the morning. [...]
Re: A brief look at deprecating our JSON extensions over RFC 8259
Daniel P. Berrangé writes: > On Mon, Feb 22, 2021 at 06:47:30PM +0100, Paolo Bonzini wrote: >> On 22/02/21 16:24, Daniel P. Berrangé wrote: >> > This problem isn't unique to QEMU. Any app using JSON from the >> > shell will have the tedium of quote escaping. JSON is incredibly >> > widespread and no other apps felt it neccessary to introduce single >> > quoting support, because the benefit doesn't outweigh the interop >> > problem it introduces. >> >> The quotes were introduced for C code (and especially qtest), not for the >> shell. We have something like >> >> response = qmp("{ 'execute': 'qom-get', 'arguments': { 'path': %s, " >>"'property': 'temperature' } }", id); >> >> These are sent to QEMU as double-quoted strings (the single-quoted JSON is >> parsed to get interpolation and printed back; commit 563890c7c7, "libqtest: >> escape strings in QMP commands, fix leak", 2014-07-01). However, doing the >> interpolation requires a parser that recognizes the single-quoted strings. > > IMHO this is the wrong solution to the problem. Consider the equivalent > libvirt code that uses a standard JSON library underneath and has a high > level API to serialize args into the command > > qemuMonitorJSONMakeCommand("qom-get", > "s:path", id, >"s:property", "temperature"); > > Of course this example is reasonably easy since it is a flat set of > arguments. Nested args get slightly more complicated, but still always > preferrable to doing string interpolation IMHO. Misunderstanding: our JSON interpolation feature is *not* string interpolation! It interpolates *objects* into the QObject built by the parser. Best explained with an example. The qmp() call above internally builds the following QObject: QDict mapping key "execute" to a QString for "qom-get" key "arguments" to a QDict mapping key "path" to a QString for @id (interpolation!) key "property" to a QString for "temperature" Unlike string interpolation, this is safe for any valid C string @id. If you think like a hacker, you might try shenanigans like "{'execute': '%s'}" You will then find that somebody has thought like a hacker before you[*]. The %-sequences are cleverly chosen to get some protection against common mistakes from the compiler. This interpolation has worked quite well for us, and I have no plans to replace it. Doesn't mean I'm against replacing it, only that I want to see benefits exceeding the costs. A bigger stumbling block for replacement is our need for a streaming interface: we feed the parser characters, and expect to be called back when an expression is complete. [*] Commit 16a4859921 "json: Improve safety of qobject_from_jsonf_nofail() & friends", fixed up in commit bbc0586ced "json: Fix % handling when not interpolating".
Re: A brief look at deprecating our JSON extensions over RFC 8259
Paolo Bonzini writes: > On 22/02/21 16:24, Daniel P. Berrangé wrote: >> This problem isn't unique to QEMU. Any app using JSON from the >> shell will have the tedium of quote escaping. JSON is incredibly >> widespread and no other apps felt it neccessary to introduce single >> quoting support, because the benefit doesn't outweigh the interop >> problem it introduces. > > The quotes were introduced for C code (and especially qtest), not for > the shell. We have something like > > response = qmp("{ 'execute': 'qom-get', 'arguments': { 'path': %s, " >"'property': 'temperature' } }", id); > > These are sent to QEMU as double-quoted strings (the single-quoted > JSON is parsed to get interpolation and printed back; commit > 563890c7c7, "libqtest: escape strings in QMP commands, fix leak", > 2014-07-01). However, doing the interpolation requires a parser that > recognizes the single-quoted strings. Doing interpolation requires a parser that recognizes %-sequences. Single quote support isn't *required*, but quite desirable to let us avoid leaning toothpick syndrome (LTS). Example: compare the above to response = qmp("{ \"execute\": \"qom-get\", \"arguments\": { \"path\": %s, " "\"property\": \"temperature\" } }", id); We kept the interpolation extension out of the external interfaces, but not the single quotes. > Markus, did you rebuild the qtests after disabling single-quoted > strings? "make check-qtest-x86_64" would have rebuilt them, but I'm > confused by the results. I ran "make check" and looked at the failures: * check-qjson.c - escaped_string() covers \'. Naturally, this fails. - escaped_string() and utf8_string() try every string twice, first in double quotes, then in single quotes. Naturally, the latter fails. - string_with_quotes() tests unquoted single quote in double-quoted string, and unquoted double quote in single-quoted string. Naturally, the latter fails. - large_dict() and simple_whitespace() use single quotes to avoid LTS. This is the test my "The unit test testing the JSON parser is of course excused" referred to. * test-qobject-input-visitor.c * qtest/qmp-test.c More LTS avoidance. This is "The remaining qtest and the unit test could perhaps be dismissed as atypical use of QEMU from C." * tests/qemu-iotests/ Unlike the tests above, these use *external* interfaces. In shell, we need to use double quotes to get parameter expansion. We then use single quotes to avoid LTS. The Python code has less excuse, I think. Still confused?
Re: A brief look at deprecating our JSON extensions over RFC 8259
On Mon, Feb 22, 2021 at 19:22:49 +0100, Peter Krempa wrote: > On Mon, Feb 22, 2021 at 18:42:00 +0100, Paolo Bonzini wrote: > > On 22/02/21 15:57, Markus Armbruster wrote: > > > * The block layer's pseudo-protocol "json:" (which can get embedded in > > >image headers) > > > > If it gets embedded in image headers, I don't think we'll be able to > > deprecate it ever. We'd need to keep a converter for old images, at which > > point it's simpler to keep the extensions. > > The converter or better 'fixer' actually doesn't need to be able to > interpret the old string, just accept a new. IOW it's more of a > documentation problem, because qemu-img can already do that since it's > able to write invalid JSON without interpreting it: [...] I forgot to add that such strings would be user-originated in the first place. The qemu-generated one are (presumably) correct JSON.
Re: A brief look at deprecating our JSON extensions over RFC 8259
On Mon, Feb 22, 2021 at 18:42:00 +0100, Paolo Bonzini wrote: > On 22/02/21 15:57, Markus Armbruster wrote: > > * The block layer's pseudo-protocol "json:" (which can get embedded in > >image headers) > > If it gets embedded in image headers, I don't think we'll be able to > deprecate it ever. We'd need to keep a converter for old images, at which > point it's simpler to keep the extensions. The converter or better 'fixer' actually doesn't need to be able to interpret the old string, just accept a new. IOW it's more of a documentation problem, because qemu-img can already do that since it's able to write invalid JSON without interpreting it: $ qemu-img rebase -f qcow2 -F qcow2 -b 'json:{' -u /tmp/ble.qcow2 $ qemu-img info /tmp/ble.qcow2 image: /tmp/ble.qcow2 file format: qcow2 virtual size: 10 MiB (10485760 bytes) disk size: 196 KiB cluster_size: 65536 backing file: json:{ backing file format: qcow2 Format specific information: compat: 1.1 compression type: zlib lazy refcounts: false refcount bits: 16 corrupt: false
Re: A brief look at deprecating our JSON extensions over RFC 8259
On 22/02/21 18:54, Daniel P. Berrangé wrote: These are sent to QEMU as double-quoted strings (the single-quoted JSON is parsed to get interpolation and printed back; commit 563890c7c7, "libqtest: escape strings in QMP commands, fix leak", 2014-07-01). However, doing the interpolation requires a parser that recognizes the single-quoted strings. IMHO this is the wrong solution to the problem. Consider the equivalent libvirt code that uses a standard JSON library underneath and has a high level API to serialize args into the command qemuMonitorJSONMakeCommand("qom-get", "s:path", id, "s:property", "temperature"); Of course this example is reasonably easy since it is a flat set of arguments. Nested args get slightly more complicated, but still always preferrable to doing string interpolation IMHO. I don't disagree. I'm just stating why I wanted a clarification from Markus. Paolo
Re: A brief look at deprecating our JSON extensions over RFC 8259
On Mon, Feb 22, 2021 at 06:42:00PM +0100, Paolo Bonzini wrote: > On 22/02/21 15:57, Markus Armbruster wrote: > > * The block layer's pseudo-protocol "json:" (which can get embedded in > >image headers) > > If it gets embedded in image headers, I don't think we'll be able to > deprecate it ever. We'd need to keep a converter for old images, at which > point it's simpler to keep the extensions. Even if we can only use a standard JSON parser for QMP + QGA, that's a already a significant net win long term IMHO. Both of those are security critical components and also areas where we might want different language impls as we increasingly use a multi-process model, so avoiding use of extensins is good. Even for the block layer, we don't neccessarily need to keep compat at runtime. It could be sufficient to have an extended deprecation period and then provide an offline helper script to re-write the qcow2 backing store field to use " instead of ' assuming we actually get real world pushback from people who really have used ' - we don't know if there are such people yet. We can do deprecation in a multi stage process - deprecation for everything, then block it for QMP + QGA after 2 cycles, while still allowing it for qcow2, and eventually block for qcow2 3 years later or something like that. IOW, I wouldn't give up on trying to deprecate it. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: A brief look at deprecating our JSON extensions over RFC 8259
On Mon, Feb 22, 2021 at 06:47:30PM +0100, Paolo Bonzini wrote: > On 22/02/21 16:24, Daniel P. Berrangé wrote: > > This problem isn't unique to QEMU. Any app using JSON from the > > shell will have the tedium of quote escaping. JSON is incredibly > > widespread and no other apps felt it neccessary to introduce single > > quoting support, because the benefit doesn't outweigh the interop > > problem it introduces. > > The quotes were introduced for C code (and especially qtest), not for the > shell. We have something like > > response = qmp("{ 'execute': 'qom-get', 'arguments': { 'path': %s, " >"'property': 'temperature' } }", id); > > These are sent to QEMU as double-quoted strings (the single-quoted JSON is > parsed to get interpolation and printed back; commit 563890c7c7, "libqtest: > escape strings in QMP commands, fix leak", 2014-07-01). However, doing the > interpolation requires a parser that recognizes the single-quoted strings. IMHO this is the wrong solution to the problem. Consider the equivalent libvirt code that uses a standard JSON library underneath and has a high level API to serialize args into the command qemuMonitorJSONMakeCommand("qom-get", "s:path", id, "s:property", "temperature"); Of course this example is reasonably easy since it is a flat set of arguments. Nested args get slightly more complicated, but still always preferrable to doing string interpolation IMHO. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: A brief look at deprecating our JSON extensions over RFC 8259
On 22/02/21 16:24, Daniel P. Berrangé wrote: This problem isn't unique to QEMU. Any app using JSON from the shell will have the tedium of quote escaping. JSON is incredibly widespread and no other apps felt it neccessary to introduce single quoting support, because the benefit doesn't outweigh the interop problem it introduces. The quotes were introduced for C code (and especially qtest), not for the shell. We have something like response = qmp("{ 'execute': 'qom-get', 'arguments': { 'path': %s, " "'property': 'temperature' } }", id); These are sent to QEMU as double-quoted strings (the single-quoted JSON is parsed to get interpolation and printed back; commit 563890c7c7, "libqtest: escape strings in QMP commands, fix leak", 2014-07-01). However, doing the interpolation requires a parser that recognizes the single-quoted strings. Markus, did you rebuild the qtests after disabling single-quoted strings? "make check-qtest-x86_64" would have rebuilt them, but I'm confused by the results. Paolo
Re: A brief look at deprecating our JSON extensions over RFC 8259
On 22/02/21 15:57, Markus Armbruster wrote: * The block layer's pseudo-protocol "json:" (which can get embedded in image headers) If it gets embedded in image headers, I don't think we'll be able to deprecate it ever. We'd need to keep a converter for old images, at which point it's simpler to keep the extensions. Paolo
Re: A brief look at deprecating our JSON extensions over RFC 8259
On Mon, 22 Feb 2021 at 17:27, Daniel P. Berrangé wrote: > > IMHO we should deprecate and eventually remove single quotes +1 If a JSON cannot be directly processed by the standard JavaScript parser, it should not be used. Regards, Liviu -- Sent from my iPad via Gmail.
Re: A brief look at deprecating our JSON extensions over RFC 8259
On Mon, Feb 22, 2021 at 03:57:22PM +0100, Markus Armbruster wrote: > We use JSON in several external interfaces: > > * QMP > > * The guest agent's QMP > > * QAPIfied command line options when the option argument starts with > '{' > > * The block layer's pseudo-protocol "json:" (which can get embedded in > image headers) > > I *think* that's all. > > The JSON parser we use for these interfaces supports extensions over RFC > 8259. Quoting json-lexer.c: > > - Extra escape sequence in strings: > 0x27 (apostrophe) is recognized after escape, too > > - Single-quoted strings: > Like double-quoted strings, except they're delimited by %x27 > (apostrophe) instead of %x22 (quotation mark), and can't contain > unescaped apostrophe, but can contain unescaped quotation mark. > > - Interpolation, if enabled: > The lexer accepts %[A-Za-z0-9]*, and leaves rejecting invalid > ones to the parser. > > Ignore interpolation; it's never enabled at external interfaces. > > This leaves single-quotes strings and the escape sequence to go with > them. > > I disabled them as an experiment. Some 20 iotests, a qtest and two unit > tests explode. > > The unit test testing the JSON parser is of course excused. > > The remaining qtest and the unit test could perhaps be dismissed as > atypical use of QEMU from C. The iotests less so, I think. > > I looked at some iotest failures, and quickly found single-quoted > strings used with all external interfaces except for qemu-ga's QMP. > > We could certainly tidy up the tests to stick to standard JSON. > However, the prevalence of single-quoted strings in iotests makes me > suspect that they are being used in the field as well. Deprecating the > extension is likely more trouble than it's worth. The shell based iotests use single quotes primarily because they're being written in a language which lacks the concept of libraries and and so all JSON is constructed by string substitution. Using single quotes is convenient to avoid continually escaping double quotes. For any other language constructing JSON documents through string substitution is insanity, because they all have JSON libraries available which let you construct JSON documents progamatically without risk of introducing security flaws through malicious substitutions. This problem isn't unique to QEMU. Any app using JSON from the shell will have the tedium of quote escaping. JSON is incredibly widespread and no other apps felt it neccessary to introduce single quoting support, because the benefit doesn't outweigh the interop problem it introduces. > Opinions? IMHO we should deprecate and eventually remove single quotes. We should expect mgmt apps to be using a JSON library to talk to QEMU in general if they are using QMP. Sure some may be using shell, but I'd expect that to be relatively few. Adapting is tedious but not especially hard. It would be nice at some point in future to have the option of using a standard JSON library in part or all of QEMU. Especially if we ever want to be able to have parts of QEMU written in non-C language, we don't want to re-invent a custom JSON parser as the first step, for back compatibility. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: A brief look at deprecating our JSON extensions over RFC 8259
On Mon, Feb 22, 2021 at 15:57:22 +0100, Markus Armbruster wrote: > We use JSON in several external interfaces: > > * QMP > > * The guest agent's QMP > > * QAPIfied command line options when the option argument starts with > '{' > > * The block layer's pseudo-protocol "json:" (which can get embedded in > image headers) > > I *think* that's all. > > The JSON parser we use for these interfaces supports extensions over RFC > 8259. Quoting json-lexer.c: > > - Extra escape sequence in strings: > 0x27 (apostrophe) is recognized after escape, too > > - Single-quoted strings: > Like double-quoted strings, except they're delimited by %x27 > (apostrophe) instead of %x22 (quotation mark), and can't contain > unescaped apostrophe, but can contain unescaped quotation mark. [...] > We could certainly tidy up the tests to stick to standard JSON. > However, the prevalence of single-quoted strings in iotests makes me > suspect that they are being used in the field as well. Deprecating the > extension is likely more trouble than it's worth. > > Opinions? Any user of QEMU through libvirt will not use any of the extensions even in cases such as QMP command pasthrough (virsh qemu-monitor-command) and the 'json:' pseudo-protocol, as libvirt parses the provided JSON to add sequencing for QMP passthrough, and for image chain detection in case of 'json:'. Since libvirt's JSON library (yajl) doesn't support any of those extensions users are forced to not use them. So on behalf of libvirt, we'd be fine with deprecation/removal of those.