Re: A brief look at deprecating our JSON extensions over RFC 8259

2021-02-23 Thread Markus Armbruster
Peter Maydell  writes:

> On Tue, 23 Feb 2021 at 09:33, Markus Armbruster  wrote:
>> Misunderstanding: our JSON interpolation feature is *not* string
>> interpolation!  It interpolates *objects* into the QObject built by the
>> parser.
>
> Given that it's basically undocumented except in a scattered
> handful of comments inside the qjson parser implementation, it's
> not too surprising that people misunderstand it :-)

Yes, that's fair.

I added a fair amount of commentary, but it's heavily geared towards
maintainers, not users.

> (One surprising
> feature: the parser takes ownership of the object that you pass it
> via the '%p' interpolation, and will qobject_unref() it.)

Yes, %p takes over the reference.




Re: A brief look at deprecating our JSON extensions over RFC 8259

2021-02-23 Thread Markus Armbruster
Paolo Bonzini  writes:

> On 23/02/21 10:06, Markus Armbruster wrote:
>>> Markus, did you rebuild the qtests after disabling single-quoted
>>> strings?  "make check-qtest-x86_64" would have rebuilt them, but I'm
>>> confused by the results.
>> I ran "make check" and looked at the failures:
>> Still confused?
>
> Yes.  What's the patch that you used to disable the single quotes, and
> why didn't the patched parser choke on
>
> response = qmp("{ 'execute': 'qom-get', 'arguments': { 'path': %s, "
>"'property': 'temperature' } }", id);
>
> ?

My bad!  I neglected to mention that I tied "disable single-quoted
strings" to "interpolation is off" for my experiment.  This is a lazy,
half-assed approximation of "internal interface".

Here's the experimental patch.


commit 57138b9d4188dd8ce1814237fe53f7131bbb3f45
Author: Markus Armbruster 
Date:   Mon Feb 22 17:04:10 2021 +0100

qobject: Tie single quote extension to interpolation WIP

This makes several tests fail or hang.  Some are hacked up.

diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index 008b326fb8..c1ddc65d96 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -150,9 +150,6 @@ static QString *parse_string(JSONParserContext *ctxt, 
JSONToken *token)
 case '"':
 g_string_append_c(str, '"');
 break;
-case '\'':
-g_string_append_c(str, '\'');
-break;
 case '\\':
 g_string_append_c(str, '\\');
 break;
@@ -201,6 +198,12 @@ static QString *parse_string(JSONParserContext *ctxt, 
JSONToken *token)
 }
 g_string_append(str, utf8_buf);
 break;
+case '\'':
+if (ctxt->ap) {
+g_string_append_c(str, '\'');
+break;
+}
+/* fall through */
 default:
 parse_error(ctxt, token, "invalid escape sequence in string");
 goto out;
diff --git a/qobject/json-streamer.c b/qobject/json-streamer.c
index b93d97b995..3d4d3b484e 100644
--- a/qobject/json-streamer.c
+++ b/qobject/json-streamer.c
@@ -49,6 +49,11 @@ void json_message_process_token(JSONLexer *lexer, GString 
*input,
 case JSON_RSQUARE:
 parser->bracket_count--;
 break;
+case JSON_STRING:
+if (input->str[0] == '\"' || parser->ap) {
+break;
+}
+/* fall through */
 case JSON_ERROR:
 error_setg(, "JSON parse error, stray '%s'", input->str);
 goto out_emit;




Re: A brief look at deprecating our JSON extensions over RFC 8259

2021-02-23 Thread Peter Maydell
On Tue, 23 Feb 2021 at 09:33, Markus Armbruster  wrote:
> Misunderstanding: our JSON interpolation feature is *not* string
> interpolation!  It interpolates *objects* into the QObject built by the
> parser.

Given that it's basically undocumented except in a scattered
handful of comments inside the qjson parser implementation, it's
not too surprising that people misunderstand it :-) (One surprising
feature: the parser takes ownership of the object that you pass it
via the '%p' interpolation, and will qobject_unref() it.)

-- PMM



Re: A brief look at deprecating our JSON extensions over RFC 8259

2021-02-23 Thread Paolo Bonzini

On 23/02/21 10:06, Markus Armbruster wrote:

Markus, did you rebuild the qtests after disabling single-quoted
strings?  "make check-qtest-x86_64" would have rebuilt them, but I'm
confused by the results.

I ran "make check" and looked at the failures:

Still confused?


Yes.  What's the patch that you used to disable the single quotes, and 
why didn't the patched parser choke on


response = qmp("{ 'execute': 'qom-get', 'arguments': { 'path': %s, "
   "'property': 'temperature' } }", id);

?

Paolo




Re: A brief look at deprecating our JSON extensions over RFC 8259

2021-02-23 Thread Markus Armbruster
Markus Armbruster  writes:

[...]
> A bigger stumbling block for replacement is our need for a streaming
> interface: we feed the parser characters, and expect to be called back
> when an expression is complete.

Another stumbling block: check-qjson.c test case "/literals/string/utf8"
and "/literals/string/escaped".  Off-the-shelf parsers flunking them
would surprise me about as much as the sun going up in the morning.

[...]




Re: A brief look at deprecating our JSON extensions over RFC 8259

2021-02-23 Thread Markus Armbruster
Daniel P. Berrangé  writes:

> On Mon, Feb 22, 2021 at 06:47:30PM +0100, Paolo Bonzini wrote:
>> On 22/02/21 16:24, Daniel P. Berrangé wrote:
>> > This problem isn't unique to QEMU. Any app using JSON from the
>> > shell will have the tedium of quote escaping. JSON is incredibly
>> > widespread and no other apps felt it neccessary to introduce single
>> > quoting support, because the benefit doesn't outweigh the interop
>> > problem it introduces.
>> 
>> The quotes were introduced for C code (and especially qtest), not for the
>> shell.  We have something like
>> 
>> response = qmp("{ 'execute': 'qom-get', 'arguments': { 'path': %s, "
>>"'property': 'temperature' } }", id);
>> 
>> These are sent to QEMU as double-quoted strings (the single-quoted JSON is
>> parsed to get interpolation and printed back; commit 563890c7c7, "libqtest:
>> escape strings in QMP commands, fix leak", 2014-07-01). However, doing the
>> interpolation requires a parser that recognizes the single-quoted strings.
>
> IMHO this is the wrong solution to the problem.  Consider the equivalent
> libvirt code that uses a standard JSON library underneath and has a high
> level API to serialize args into the command
>
>   qemuMonitorJSONMakeCommand("qom-get",
>  "s:path", id,
>"s:property", "temperature");
>
> Of course this example is reasonably easy since it is a flat set of
> arguments. Nested args get slightly more complicated, but still always
> preferrable to doing string interpolation IMHO.

Misunderstanding: our JSON interpolation feature is *not* string
interpolation!  It interpolates *objects* into the QObject built by the
parser.

Best explained with an example.  The qmp() call above internally builds
the following QObject:

QDict mapping
key "execute" to a QString for "qom-get"
key "arguments" to a QDict mapping
key "path" to a QString for @id (interpolation!)
key "property" to a QString for "temperature"

Unlike string interpolation, this is safe for any valid C string @id.

If you think like a hacker, you might try shenanigans like

"{'execute': '%s'}"

You will then find that somebody has thought like a hacker before
you[*].

The %-sequences are cleverly chosen to get some protection against
common mistakes from the compiler.

This interpolation has worked quite well for us, and I have no plans to
replace it.  Doesn't mean I'm against replacing it, only that I want to
see benefits exceeding the costs.

A bigger stumbling block for replacement is our need for a streaming
interface: we feed the parser characters, and expect to be called back
when an expression is complete.


[*] Commit 16a4859921 "json: Improve safety of
qobject_from_jsonf_nofail() & friends", fixed up in commit bbc0586ced
"json: Fix % handling when not interpolating".




Re: A brief look at deprecating our JSON extensions over RFC 8259

2021-02-23 Thread Markus Armbruster
Paolo Bonzini  writes:

> On 22/02/21 16:24, Daniel P. Berrangé wrote:
>> This problem isn't unique to QEMU. Any app using JSON from the
>> shell will have the tedium of quote escaping. JSON is incredibly
>> widespread and no other apps felt it neccessary to introduce single
>> quoting support, because the benefit doesn't outweigh the interop
>> problem it introduces.
>
> The quotes were introduced for C code (and especially qtest), not for
> the shell.  We have something like
>
> response = qmp("{ 'execute': 'qom-get', 'arguments': { 'path': %s, "
>"'property': 'temperature' } }", id);
>
> These are sent to QEMU as double-quoted strings (the single-quoted
> JSON is parsed to get interpolation and printed back; commit
> 563890c7c7, "libqtest: escape strings in QMP commands, fix leak",
> 2014-07-01). However, doing the interpolation requires a parser that
> recognizes the single-quoted strings.

Doing interpolation requires a parser that recognizes %-sequences.
Single quote support isn't *required*, but quite desirable to let us
avoid leaning toothpick syndrome (LTS).

Example: compare the above to

  response = qmp("{ \"execute\": \"qom-get\", \"arguments\": { \"path\": 
%s, "
 "\"property\": \"temperature\" } }", id);

We kept the interpolation extension out of the external interfaces, but
not the single quotes.

> Markus, did you rebuild the qtests after disabling single-quoted
> strings?  "make check-qtest-x86_64" would have rebuilt them, but I'm 
> confused by the results.

I ran "make check" and looked at the failures:

* check-qjson.c

  - escaped_string() covers \'.  Naturally, this fails.

  - escaped_string() and utf8_string() try every string twice, first in
double quotes, then in single quotes.  Naturally, the latter fails.

  - string_with_quotes() tests unquoted single quote in double-quoted
string, and unquoted double quote in single-quoted string.
Naturally, the latter fails.

  - large_dict() and simple_whitespace() use single quotes to avoid LTS.

  This is the test my "The unit test testing the JSON parser is of
  course excused" referred to.

* test-qobject-input-visitor.c
* qtest/qmp-test.c

  More LTS avoidance.

  This is "The remaining qtest and the unit test could perhaps be
  dismissed as atypical use of QEMU from C."

* tests/qemu-iotests/

  Unlike the tests above, these use *external* interfaces.

  In shell, we need to use double quotes to get parameter expansion.  We
  then use single quotes to avoid LTS.

  The Python code has less excuse, I think.

Still confused?




Re: A brief look at deprecating our JSON extensions over RFC 8259

2021-02-22 Thread Peter Krempa
On Mon, Feb 22, 2021 at 19:22:49 +0100, Peter Krempa wrote:
> On Mon, Feb 22, 2021 at 18:42:00 +0100, Paolo Bonzini wrote:
> > On 22/02/21 15:57, Markus Armbruster wrote:
> > > * The block layer's pseudo-protocol "json:" (which can get embedded in
> > >image headers)
> > 
> > If it gets embedded in image headers, I don't think we'll be able to
> > deprecate it ever.  We'd need to keep a converter for old images, at which
> > point it's simpler to keep the extensions.
> 
> The converter or better 'fixer' actually doesn't need to be able to
> interpret the old string, just accept a new. IOW it's more of a
> documentation problem, because qemu-img can already do that since it's
> able to write invalid JSON without interpreting it:

[...]

I forgot to add that such strings would be user-originated in the first
place. The qemu-generated one are (presumably) correct JSON.




Re: A brief look at deprecating our JSON extensions over RFC 8259

2021-02-22 Thread Peter Krempa
On Mon, Feb 22, 2021 at 18:42:00 +0100, Paolo Bonzini wrote:
> On 22/02/21 15:57, Markus Armbruster wrote:
> > * The block layer's pseudo-protocol "json:" (which can get embedded in
> >image headers)
> 
> If it gets embedded in image headers, I don't think we'll be able to
> deprecate it ever.  We'd need to keep a converter for old images, at which
> point it's simpler to keep the extensions.

The converter or better 'fixer' actually doesn't need to be able to
interpret the old string, just accept a new. IOW it's more of a
documentation problem, because qemu-img can already do that since it's
able to write invalid JSON without interpreting it:

$ qemu-img rebase -f qcow2 -F qcow2 -b 'json:{' -u /tmp/ble.qcow2
$ qemu-img info /tmp/ble.qcow2
image: /tmp/ble.qcow2
file format: qcow2
virtual size: 10 MiB (10485760 bytes)
disk size: 196 KiB
cluster_size: 65536
backing file: json:{
backing file format: qcow2
Format specific information:
compat: 1.1
compression type: zlib
lazy refcounts: false
refcount bits: 16
corrupt: false




Re: A brief look at deprecating our JSON extensions over RFC 8259

2021-02-22 Thread Paolo Bonzini

On 22/02/21 18:54, Daniel P. Berrangé wrote:

These are sent to QEMU as double-quoted strings (the single-quoted JSON is
parsed to get interpolation and printed back; commit 563890c7c7, "libqtest:
escape strings in QMP commands, fix leak", 2014-07-01). However, doing the
interpolation requires a parser that recognizes the single-quoted strings.


IMHO this is the wrong solution to the problem.  Consider the equivalent
libvirt code that uses a standard JSON library underneath and has a high
level API to serialize args into the command

   qemuMonitorJSONMakeCommand("qom-get",
  "s:path", id,
 "s:property", "temperature");

Of course this example is reasonably easy since it is a flat set of
arguments. Nested args get slightly more complicated, but still always
preferrable to doing string interpolation IMHO.


I don't disagree. I'm just stating why I wanted a clarification from Markus.

Paolo




Re: A brief look at deprecating our JSON extensions over RFC 8259

2021-02-22 Thread Daniel P . Berrangé
On Mon, Feb 22, 2021 at 06:42:00PM +0100, Paolo Bonzini wrote:
> On 22/02/21 15:57, Markus Armbruster wrote:
> > * The block layer's pseudo-protocol "json:" (which can get embedded in
> >image headers)
> 
> If it gets embedded in image headers, I don't think we'll be able to
> deprecate it ever.  We'd need to keep a converter for old images, at which
> point it's simpler to keep the extensions.

Even if we can only use a standard JSON parser for QMP + QGA, that's a
already a significant net win long term IMHO. Both of those are security
critical components and also areas where we might want different language
impls as we increasingly use a multi-process model, so avoiding use of
extensins is good.

Even for the block layer, we don't neccessarily need to keep compat at
runtime. It could be sufficient to have an extended deprecation period
and then provide an offline helper script to re-write the qcow2 backing
store field to use " instead of ' assuming we actually get real
world pushback from people who really have used ' - we don't know if
there are such people yet.

We can do deprecation in a multi stage process - deprecation for everything,
then block it for QMP + QGA after 2 cycles, while still allowing it for qcow2,
and  eventually block for qcow2 3 years later or something like that.

IOW, I wouldn't give up on trying to deprecate it.



Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: A brief look at deprecating our JSON extensions over RFC 8259

2021-02-22 Thread Daniel P . Berrangé
On Mon, Feb 22, 2021 at 06:47:30PM +0100, Paolo Bonzini wrote:
> On 22/02/21 16:24, Daniel P. Berrangé wrote:
> > This problem isn't unique to QEMU. Any app using JSON from the
> > shell will have the tedium of quote escaping. JSON is incredibly
> > widespread and no other apps felt it neccessary to introduce single
> > quoting support, because the benefit doesn't outweigh the interop
> > problem it introduces.
> 
> The quotes were introduced for C code (and especially qtest), not for the
> shell.  We have something like
> 
> response = qmp("{ 'execute': 'qom-get', 'arguments': { 'path': %s, "
>"'property': 'temperature' } }", id);
> 
> These are sent to QEMU as double-quoted strings (the single-quoted JSON is
> parsed to get interpolation and printed back; commit 563890c7c7, "libqtest:
> escape strings in QMP commands, fix leak", 2014-07-01). However, doing the
> interpolation requires a parser that recognizes the single-quoted strings.

IMHO this is the wrong solution to the problem.  Consider the equivalent
libvirt code that uses a standard JSON library underneath and has a high
level API to serialize args into the command

  qemuMonitorJSONMakeCommand("qom-get",
 "s:path", id,
 "s:property", "temperature");

Of course this example is reasonably easy since it is a flat set of
arguments. Nested args get slightly more complicated, but still always
preferrable to doing string interpolation IMHO.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: A brief look at deprecating our JSON extensions over RFC 8259

2021-02-22 Thread Paolo Bonzini

On 22/02/21 16:24, Daniel P. Berrangé wrote:

This problem isn't unique to QEMU. Any app using JSON from the
shell will have the tedium of quote escaping. JSON is incredibly
widespread and no other apps felt it neccessary to introduce single
quoting support, because the benefit doesn't outweigh the interop
problem it introduces.


The quotes were introduced for C code (and especially qtest), not for 
the shell.  We have something like


response = qmp("{ 'execute': 'qom-get', 'arguments': { 'path': %s, "
   "'property': 'temperature' } }", id);

These are sent to QEMU as double-quoted strings (the single-quoted JSON 
is parsed to get interpolation and printed back; commit 563890c7c7, 
"libqtest: escape strings in QMP commands, fix leak", 2014-07-01). 
However, doing the interpolation requires a parser that recognizes the 
single-quoted strings.


Markus, did you rebuild the qtests after disabling single-quoted 
strings?  "make check-qtest-x86_64" would have rebuilt them, but I'm 
confused by the results.


Paolo




Re: A brief look at deprecating our JSON extensions over RFC 8259

2021-02-22 Thread Paolo Bonzini

On 22/02/21 15:57, Markus Armbruster wrote:

* The block layer's pseudo-protocol "json:" (which can get embedded in
   image headers)


If it gets embedded in image headers, I don't think we'll be able to 
deprecate it ever.  We'd need to keep a converter for old images, at 
which point it's simpler to keep the extensions.


Paolo




Re: A brief look at deprecating our JSON extensions over RFC 8259

2021-02-22 Thread Liviu Ionescu
On Mon, 22 Feb 2021 at 17:27, Daniel P. Berrangé 
wrote:

>
> IMHO we should deprecate and eventually remove single quotes


+1

If a JSON cannot be directly processed by the standard JavaScript parser,
it should not be used.


Regards,

Liviu

-- 
Sent from my iPad via Gmail.


Re: A brief look at deprecating our JSON extensions over RFC 8259

2021-02-22 Thread Daniel P . Berrangé
On Mon, Feb 22, 2021 at 03:57:22PM +0100, Markus Armbruster wrote:
> We use JSON in several external interfaces:
> 
> * QMP
> 
> * The guest agent's QMP
> 
> * QAPIfied command line options when the option argument starts with
>   '{'
> 
> * The block layer's pseudo-protocol "json:" (which can get embedded in
>   image headers)
> 
> I *think* that's all.
> 
> The JSON parser we use for these interfaces supports extensions over RFC
> 8259.  Quoting json-lexer.c:
> 
> - Extra escape sequence in strings:
>   0x27 (apostrophe) is recognized after escape, too
> 
> - Single-quoted strings:
>   Like double-quoted strings, except they're delimited by %x27
>   (apostrophe) instead of %x22 (quotation mark), and can't contain
>   unescaped apostrophe, but can contain unescaped quotation mark.
> 
> - Interpolation, if enabled:
>   The lexer accepts %[A-Za-z0-9]*, and leaves rejecting invalid
>   ones to the parser.
> 
> Ignore interpolation; it's never enabled at external interfaces.
> 
> This leaves single-quotes strings and the escape sequence to go with
> them.
> 
> I disabled them as an experiment.  Some 20 iotests, a qtest and two unit
> tests explode.
> 
> The unit test testing the JSON parser is of course excused.
> 
> The remaining qtest and the unit test could perhaps be dismissed as
> atypical use of QEMU from C.  The iotests less so, I think.
> 
> I looked at some iotest failures, and quickly found single-quoted
> strings used with all external interfaces except for qemu-ga's QMP.
> 
> We could certainly tidy up the tests to stick to standard JSON.
> However, the prevalence of single-quoted strings in iotests makes me
> suspect that they are being used in the field as well.  Deprecating the
> extension is likely more trouble than it's worth.

The shell based iotests use single quotes primarily because they're
being written in a language which lacks the concept of libraries and
and so all JSON is constructed by string substitution. Using single
quotes is convenient to avoid continually escaping double quotes.

For any other language constructing JSON documents through string
substitution is insanity, because they all have JSON libraries
available which let you construct JSON documents progamatically
without risk of introducing security flaws through malicious
substitutions.

This problem isn't unique to QEMU. Any app using JSON from the
shell will have the tedium of quote escaping. JSON is incredibly
widespread and no other apps felt it neccessary to introduce single
quoting support, because the benefit doesn't outweigh the interop
problem it introduces.

> Opinions?

IMHO we should deprecate and eventually remove single quotes. We
should expect mgmt apps to be using a JSON library to talk to QEMU
in general if they are using QMP. Sure some may be using shell, but
I'd expect that to be relatively few. Adapting is tedious but not
especially hard.

It would be nice at some point in future to have the option of using
a standard JSON library in part or all of QEMU. Especially if we
ever want to be able to have parts of QEMU written in non-C language,
we don't want to re-invent a custom JSON parser as the first step,
for back compatibility.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: A brief look at deprecating our JSON extensions over RFC 8259

2021-02-22 Thread Peter Krempa
On Mon, Feb 22, 2021 at 15:57:22 +0100, Markus Armbruster wrote:
> We use JSON in several external interfaces:
> 
> * QMP
> 
> * The guest agent's QMP
> 
> * QAPIfied command line options when the option argument starts with
>   '{'
> 
> * The block layer's pseudo-protocol "json:" (which can get embedded in
>   image headers)
> 
> I *think* that's all.
> 
> The JSON parser we use for these interfaces supports extensions over RFC
> 8259.  Quoting json-lexer.c:
> 
> - Extra escape sequence in strings:
>   0x27 (apostrophe) is recognized after escape, too
> 
> - Single-quoted strings:
>   Like double-quoted strings, except they're delimited by %x27
>   (apostrophe) instead of %x22 (quotation mark), and can't contain
>   unescaped apostrophe, but can contain unescaped quotation mark.

[...]

> We could certainly tidy up the tests to stick to standard JSON.
> However, the prevalence of single-quoted strings in iotests makes me
> suspect that they are being used in the field as well.  Deprecating the
> extension is likely more trouble than it's worth.
> 
> Opinions?

Any user of QEMU through libvirt will not use any of the extensions even
in cases such as QMP command pasthrough (virsh qemu-monitor-command) and
the 'json:' pseudo-protocol, as libvirt parses the provided JSON to add
sequencing for QMP passthrough, and for image chain detection in case of
'json:'. Since libvirt's JSON library (yajl) doesn't support any of
those extensions users are forced to not use them.

So on behalf of libvirt, we'd be fine with deprecation/removal of those.