Zouxxyy opened a new pull request, #12038:
URL: https://github.com/apache/gluten/pull/12038

   ## What changes are proposed in this pull request?
   
   When `json_tuple` is rewritten to `get_json_object` calls by 
`PullOutGenerateProjectHelper.pullOutPreProject`, the generated JSON path used 
bare bracket notation `$[key]` (without quotes). This works fine in 
Velox/simdjson, but when the expression falls back to Spark JVM execution 
(e.g., `get_json_object` is blacklisted or validation fails), Spark's 
`JsonPathParser` rejects `$[key]` and returns NULL.
   
   This PR changes the generated path format from `$[key]` to `$['key']` 
(single-quoted bracket notation), which is accepted by both Velox/simdjson and 
Spark JVM's JsonPathParser.
   
   **Root cause:**
   - Velox's `JsonPathNormalizer` normalizes `$['key']` → `$[key]` internally 
for simdjson, so both forms work.
   - Spark JVM only accepts `$.name` or `$['name']`, and rejects `$[name]` 
(bare brackets without quotes), returning NULL directly.
   
   ## How was this patch tested?
   
   Added unit tests in `MiscOperatorSuite` covering:
   - Basic single key extraction with fallback
   - Dot-containing field names (e.g., `a.b`) — the core scenario for bracket 
notation
   - Multiple keys extraction
   - Non-existent keys returning null
   - Mix of existing and non-existing keys
   - NULL JSON input handling
   
   ## Was this patch authored or co-authored using generative AI tooling?
   
   Yes
   
   Generated-by: Qoder


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to