Eric Blake <ebl...@redhat.com> writes: > On 11/02/2015 01:40 AM, Markus Armbruster wrote: >> Eric Blake <ebl...@redhat.com> writes: >> >>> Loaded question in response to >>> https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg06988.html, but >> >> Discussion of our parser's enormous appetite for wasting RAM. Fixable, >> but it's work, and it's not its only defect. >> > >>> On the other hand, one of the "features" of qemu's hand-rolled json >>> parser is the ability to do qobject_from_jsonf("{'foo':%s}", "bar") >>> (that is, we extended JSON with our notion of single-quoted strings, and >>> with our notion of %s and similar escape sequences for piecing together >>> multiple inputs into a single input stream without having to first >>> g_strdup_printf our pieces into a single string). I don't know if >>> libyajl lets us add extensions to the language it parses. >> >> Actually two separate extensions: >> >> * Single-quoted strings >> >> The extension's purpose is avoiding quotes in C. Example: >> >> "{ 'execute': 'migrate_set_speed', 'arguments': { 'value': 10 } }" >> >> is easier to read than >> >> "{ \"execute\": \"migrate_set_speed\", \"arguments\": { \"value\": 10 >> } }" >> >> Most actual uses are in tests. > > Except that we have documented that QMP clients may use it; and indeed: > > $ printf "{'execute':'qmp_capabilities'}\n{'execute':'quit'}" | \ > ./x86_64-softmmu/qemu-system-x86_64 -nographic -nodefaults -qmp stdio > {"QMP": {"version": {"qemu": {"micro": 50, "minor": 4, "major": 2}, > "package": ""}, "capabilities": []}} > {"return": {}} > {"return": {}} > {"timestamp": {"seconds": 1446478389, "microseconds": 265886}, "event": > "SHUTDOWN"}
Unnecessary mistake :( > So that is an absolute must for whatever parser we choose. My first > glance at libyajl is that it does NOT currently allow single quotes (not > even with any of its existing options), so we'd have to pre-process > input before feeding it to yajl. I'm not sure reusing an existing parser makes much sense if we need to preprocess anyway. After all, parsing JSON isn't exactly a moon shot, at least if you know what you're doing. >> JSON5, a proposed extension to JSON also supports single-quoted >> strings. So does Python. > > It would be an interesting task to see if yajl would accept patches to > parse JSON5 as additional options (for example, yajl already has options > on whether to diagnose or ignore UTF8 errors, and on whether to allow /* > */ javascript comments). But then we'd be requiring an > as-yet-unreleased version of libyajl rather than being able to rely on > what distros already have, at least for a few years; or we'd have to > carry new-enough yajl as a git submodule within qemu.git. > > I'll ask on the yajl mailing lists, to get a feel for community > receptiveness, before attempting to actually write patches on that front. Makes sense. >> * Optional interpolation >> >> If you pass a va_list to the parser, it replaces certain escape >> sequences by values from that va_list. The escape sequences are a >> subset of printf conversion specifiers, to enable compile-time >> checking with __attribute__((format...)). >> >> We used to rely on this for creating "rich" error objects, but those >> are long gone (commit df1e608). Perhaps two dozen uses are left. > > The testsuite is probably the biggest user, but we still have uses > throughout the code base. > > Based on my look at libyajl, I think we CAN get away with stream > interpolation - yajl maintains state such that you do NOT have to feed > it the entire stream in one go. So this one is not insurmountable; we > could rewrite our qobject_from_jsonf() to make multiple calls into yajl > without having to pre-format a single string. Sounds complicated... >> We could convert them to use "normal" interpolation, i.e. first format >> the arguments as JSON, then interpolate with g_strdup_printf() or >> similar, then parse. Formatting only to parse it right back is >> inelegant. Also inefficient, but that probably doesn't matter here. > > Or indeed, this is still an option. > >> >> The current interface could be retained as convenience function to >> interpolate and feed the result to the JSOn parser. >> >> Alternatively, we could build the QObject manually instead. More >> efficient than either kind of interpolation, but a good deal less >> readable. > > At any rate, it is certainly less of a show-stopper when compared to the > need for supporting single-quoted strings. > >> Got one more, actually a pitfall, not an extension: >> >> * Representation of JSON numbers >> >> RFC 7159 doesn't specify how numbers are to be represented. >> >> Many JSON implementations represent any JSON number as double. This >> can represent signed integers up to 53 bits exactly. >> >> Our parser represents numbers as int64_t when possible, else as >> double. >> >> Unlike the extensions above, this one is essential: any parser that >> can't do it is useless for us. Can yajl do it? > > Yes; the yajl parser has 4 modes of parse operation based on which of > three callbacks you implement: double-only (yajl_double), long long-only > (yajl_integer), double-and-int (both yajl_double and yajl_integer, not > sure which one has precedence if input satisfies both formats), or > custom number (yajl_number, which is given a string, and you turn it > into the format of your choice). Likewise, the yajl formatter has 2 > functions: yajl_gen_double() (formats doubles) and yajl_gen_number() > (you provide literal strings formatted how you like). Good. >> More extensions or pitfalls might be lurking in our parser. Therefore, >> replacing our parser by a suitable library is not without risk. I guess >> we could do it over a full development cycle. No way for 2.5. >> > > Absolutely not for 2.5; we've missed softfreeze, and this would count as > a new feature. > >> If we decide to attempt replacing it in the next development cycle, we >> might want to refrain from fixing its bugs except for true show >> stoppers. >>