Paolo Bonzini <[email protected]> writes:

> On 6/12/26 16:21, Markus Armbruster wrote:
>> Paolo Bonzini <[email protected]> writes:
>> 
>>> In order to avoid stashing all the tokens corresponding to a JSON value,
>>> embed the parsing stack and state machine in JSONParser.  This is more
>>> efficient and allows for more prompt error recovery; it also does not
>>> make the code substantially larger than the current recursive descent
>>> parser, though the state machine is probably a bit harder to follow.
>>>
>>> The stack consists of QLists and QDicts corresponding to open
>>> brackets and braces, plus optionally a QString with the current
>>> key on top of each QDict.
>>>
>>> After each value is parsed, it is added to the top array or dictionary
>>> or, if the stack is empty, json_parser_feed returns the complete
>>> QObject.
>>>
>>> For now, json-streamer.c keeps tracking the tokens up until braces
>>> and brackets are balanced, and then shoves the whole queue of tokens
>>> into the push parser.  The only logic change is that JSON_END_OF_INPUT
>>> always triggers the emptying of the queue; the parser takes notice and
>>> checks that there is nothing on the stack.  Not using brace_count
>>> and bracket_count for this is the first step towards improved separation
>>> of concerns between json-parser.c and json-streamer.c.
>>>
>>> Signed-off-by: Paolo Bonzini <[email protected]>
>>> ---
>>>   include/qobject/json-parser.h |   6 +
>>>   qobject/json-parser-int.h     |   5 +-
>>>   qobject/json-parser.c         | 551 ++++++++++++++++++++--------------
>>>   qobject/json-streamer.c       |  21 +-
>>>   4 files changed, 345 insertions(+), 238 deletions(-)
>>>
>>> diff --git a/include/qobject/json-parser.h b/include/qobject/json-parser.h
>>> index 7345a9bd5cb..05346fa816b 100644
>>> --- a/include/qobject/json-parser.h
>>> +++ b/include/qobject/json-parser.h
>>> @@ -20,6 +20,12 @@ typedef struct JSONLexer {
>>>      int x, y;
>>>  } JSONLexer;
>>>  +typedef struct JSONParserContext {
>>> +    Error *err;
>>> +    GQueue *stack;
>>> +    va_list *ap;
>>> +} JSONParserContext;
>>> +
>>>  typedef struct JSONMessageParser {
>>>      void (*emit)(void *opaque, QObject *json, Error *err);
>>>      void *opaque;
>>> diff --git a/qobject/json-parser-int.h b/qobject/json-parser-int.h
>>> index 8c01f236276..1f435cb8eb2 100644
>>> --- a/qobject/json-parser-int.h
>>> +++ b/qobject/json-parser-int.h
>>> @@ -49,6 +49,9 @@ void json_message_process_token(JSONLexer *lexer, GString 
>>> *input,
>>>  /* json-parser.c */
>>>  JSONToken *json_token(JSONTokenType type, int x, int y, GString *tokstr);
>>> -QObject *json_parser_parse(GQueue *tokens, va_list *ap, Error **errp);
>>> +void json_parser_init(JSONParserContext *ctxt, va_list *ap);
>>> +void json_parser_reset(JSONParserContext *ctxt);
>>> +QObject *json_parser_feed(JSONParserContext *ctxt, const JSONToken *token, 
>>> Error **errp);
>>> +void json_parser_destroy(JSONParserContext *ctxt);
>>>
>>>  #endif
>>> diff --git a/qobject/json-parser.c b/qobject/json-parser.c
>>> index f6622b82b0a..3b5edc5bae4 100644
>>> --- a/qobject/json-parser.c
>>> +++ b/qobject/json-parser.c
>>> @@ -31,12 +31,105 @@ struct JSONToken {
>>>      char str[];
>>>  };
>>> -typedef struct JSONParserContext {
>>> -    Error *err;
>>> -    JSONToken *current;
>>> -    GQueue *buf;
>>> -    va_list *ap;
>>> -} JSONParserContext;
>>> +/*
>>> + * The JSON parser is a push parser, returning to the caller after every
>>> + * token.
>>
>> The thing that returns after every token is json_parser_feed(), right?
>> Detail not mentioned here: the value it returns.  Leaving that to
>> json_parser_feed()'s contract feels fine, but pointing from here to
>> there could be useful.
>
> "returning a completed top-level object, an error, or NULL (if the object is 
> incomplete and no error happened) after every token"?

I like it!

>>> + * // The initial state is BEFORE_VALUE.
>>> + * input :=  value         -> END_OF_VALUE -> return parsed value
>>> + *           END_OF_INPUT  -> check stack is empty
>>
>> How can the stack *not* be empty here?
>
> Right, this is not END_OF_INPUT in the middle of the stream.  Will delete.
>>> + * // entered on BEFORE_KEY, with TOS being a QDict
>>> + * dict_pairs := (STRING | INTERP) -> push QString -> END_OF_KEY
>>> + *         ':'             -> BEFORE_VALUE
>>> + *         value           -> pop QString + add pair to QDict -> 
>>> END_OF_VALUE
>>> + *         ('}'            -> pop completed QDict -> END_OF_VALUE
>>> + *          | ','          -> BEFORE_KEY
>>> + *            dict_pairs)  -> END_OF_VALUE
>>> + */
>>
>> This is useful.
>>
>> It doesn't mention how we do parse errors.  Leaving that to
>> json_parser_feed()'s contract feels fine.
>
> Right---parse errors are out of the scope because recovery happens in 
> json-streamer.c.
>
> I can add a note for this and everything else, thanks for the review! 
> Rewrites are not the most enticing form of thing to receive, or the most 
> polite to send.
>
> Paolo

In all fairness, I had moaned about this parser more than once,
e.g. "it's half-assed: it's a push lexer wed to a pull parser with
parenthesis counting."


Reply via email to