morrySnow opened a new pull request, #63255:
URL: https://github.com/apache/doris/pull/63255

   ## Summary
   
   Fixes DORIS-25576.
   
   Nereids `JsonLiteral` and legacy `analysis.JsonLiteral` silently accepted 
lone UTF-16 surrogates (e.g. `'"\uD800"'::JSONB`) because Jackson and Gson both 
parse such inputs without error by default. RFC 8259 §8.2 explicitly forbids 
unpaired surrogates in JSON strings. Silent acceptance causes data-correctness 
issues: the invalid value is stored in BE and surfaces as errors only during 
export, cross-system transfer, or UTF-8 serialization.
   
   ## What problem does this PR solve?
   
   Issue Number: close #DORIS-25576
   
   Problem Summary: Add a recursive `validateNoLoneSurrogate` post-parse walk 
in both `JsonLiteral` constructors that throws `AnalysisException` immediately 
for any string node containing a lone high or low surrogate.
   
   ### Changes
   - `fe/fe-core/.../nereids/.../JsonLiteral.java`: add 
`validateNoLoneSurrogate(JsonNode)` called after Jackson parsing
   - `fe/fe-catalog/.../analysis/JsonLiteral.java`: add 
`validateNoLoneSurrogate(JsonElement)` called after Gson parsing
   - `fe/fe-core/src/test/.../JsonLiteralTest.java`: unit tests covering 
lone-high, lone-low, nested, and valid surrogate-pair cases
   
   ## Release note
   
   JSONB literal expressions now reject strings containing lone UTF-16 
surrogates (e.g. `'"\uD800"'::JSONB`) with an AnalysisException, conforming to 
RFC 8259 §8.2.
   
   ## Check List (For Author)
   
   - Test: Unit Test (`JsonLiteralTest` — lone-surrogate rejection + valid 
surrogate-pair acceptance)
   - Behavior changed: Yes — lone surrogates in JSONB literals now throw 
AnalysisException instead of being silently accepted
   - Does this need documentation: No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to