github-actions[bot] commented on code in PR #63309:
URL: https://github.com/apache/doris/pull/63309#discussion_r3252159303
##########
be/src/util/jsonb_parser_simd.h:
##########
@@ -93,6 +93,29 @@ struct JsonbParser {
simdjson::padded_string json_str {pch, len};
simdjson::ondemand::document doc =
simdjson_parser.iterate(json_str);
+ auto is_json_whitespace = [](char c) {
+ return c == ' ' || c == '\t' || c == '\n' || c == '\r';
+ };
+ const char* json_begin = json_str.data();
+ const char* json_end = json_str.data() + len;
+ while (json_begin < json_end && is_json_whitespace(*json_begin)) {
+ ++json_begin;
+ }
+ while (json_end > json_begin && is_json_whitespace(*(json_end -
1))) {
+ --json_end;
+ }
+
+ std::string_view raw_json;
+ simdjson::error_code raw_res = doc.raw_json().get(raw_json);
Review Comment:
This `raw_json()` call consumes the entire document, then `doc.rewind()`
makes the code parse and serialize the same JSON again. `JsonbParser::parse` is
shared by `json_valid`, `jsonb_parse`, casts to JSONB, and other ingestion
paths, so large object/array inputs now pay an extra full simdjson traversal
just to detect trailing content. The later code already consumes the root value
and checks `doc.at_end()`, while top-level scalar getters use simdjson's root
APIs that disallow trailing content and the new `is_null()` check covers the
`not`/partial-null case. Please avoid the whole-document `raw_json()` pre-pass
and enforce the remaining scalar/null validation in the existing parse pass so
JSONB parsing stays single-pass.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]