On 6/12/26 16:21, Markus Armbruster wrote:
Paolo Bonzini <[email protected]> writes:

In order to avoid stashing all the tokens corresponding to a JSON value,
embed the parsing stack and state machine in JSONParser.  This is more
efficient and allows for more prompt error recovery; it also does not
make the code substantially larger than the current recursive descent
parser, though the state machine is probably a bit harder to follow.

The stack consists of QLists and QDicts corresponding to open
brackets and braces, plus optionally a QString with the current
key on top of each QDict.

After each value is parsed, it is added to the top array or dictionary
or, if the stack is empty, json_parser_feed returns the complete
QObject.

For now, json-streamer.c keeps tracking the tokens up until braces
and brackets are balanced, and then shoves the whole queue of tokens
into the push parser.  The only logic change is that JSON_END_OF_INPUT
always triggers the emptying of the queue; the parser takes notice and
checks that there is nothing on the stack.  Not using brace_count
and bracket_count for this is the first step towards improved separation
of concerns between json-parser.c and json-streamer.c.

Signed-off-by: Paolo Bonzini <[email protected]>
---
  include/qobject/json-parser.h |   6 +
  qobject/json-parser-int.h     |   5 +-
  qobject/json-parser.c         | 551 ++++++++++++++++++++--------------
  qobject/json-streamer.c       |  21 +-
  4 files changed, 345 insertions(+), 238 deletions(-)

diff --git a/include/qobject/json-parser.h b/include/qobject/json-parser.h
index 7345a9bd5cb..05346fa816b 100644
--- a/include/qobject/json-parser.h
+++ b/include/qobject/json-parser.h
@@ -20,6 +20,12 @@ typedef struct JSONLexer {
      int x, y;
  } JSONLexer;
+typedef struct JSONParserContext {
+    Error *err;
+    GQueue *stack;
+    va_list *ap;
+} JSONParserContext;
+
  typedef struct JSONMessageParser {
      void (*emit)(void *opaque, QObject *json, Error *err);
      void *opaque;
diff --git a/qobject/json-parser-int.h b/qobject/json-parser-int.h
index 8c01f236276..1f435cb8eb2 100644
--- a/qobject/json-parser-int.h
+++ b/qobject/json-parser-int.h
@@ -49,6 +49,9 @@ void json_message_process_token(JSONLexer *lexer, GString 
*input,
/* json-parser.c */
  JSONToken *json_token(JSONTokenType type, int x, int y, GString *tokstr);
-QObject *json_parser_parse(GQueue *tokens, va_list *ap, Error **errp);
+void json_parser_init(JSONParserContext *ctxt, va_list *ap);
+void json_parser_reset(JSONParserContext *ctxt);
+QObject *json_parser_feed(JSONParserContext *ctxt, const JSONToken *token, 
Error **errp);
+void json_parser_destroy(JSONParserContext *ctxt);
#endif
diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index f6622b82b0a..3b5edc5bae4 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -31,12 +31,105 @@ struct JSONToken {
      char str[];
  };
-typedef struct JSONParserContext {
-    Error *err;
-    JSONToken *current;
-    GQueue *buf;
-    va_list *ap;
-} JSONParserContext;
+/*
+ * The JSON parser is a push parser, returning to the caller after every
+ * token.

The thing that returns after every token is json_parser_feed(), right?

Detail not mentioned here: the value it returns.  Leaving that to
json_parser_feed()'s contract feels fine, but pointing from here to
there could be useful.

"returning a completed top-level object, an error, or NULL (if the object is incomplete and no error happened) after every token"?
+ * // The initial state is BEFORE_VALUE.
+ * input :=  value         -> END_OF_VALUE -> return parsed value
+ *           END_OF_INPUT  -> check stack is empty

How can the stack *not* be empty here?

Right, this is not END_OF_INPUT in the middle of the stream.  Will delete.
+ * // entered on BEFORE_KEY, with TOS being a QDict
+ * dict_pairs := (STRING | INTERP) -> push QString -> END_OF_KEY
+ *         ':'             -> BEFORE_VALUE
+ *         value           -> pop QString + add pair to QDict -> END_OF_VALUE
+ *         ('}'            -> pop completed QDict -> END_OF_VALUE
+ *          | ','          -> BEFORE_KEY
+ *            dict_pairs)  -> END_OF_VALUE
+ */

This is useful.

It doesn't mention how we do parse errors.  Leaving that to
json_parser_feed()'s contract feels fine.

Right---parse errors are out of the scope because recovery happens in json-streamer.c.

I can add a note for this and everything else, thanks for the review! Rewrites are not the most enticing form of thing to receive, or the most polite to send.

Paolo


Reply via email to