Zheng Shao created SPARK-47670:
----------------------------------

             Summary: Multiple calls to GET_JSON_OBJECT with the same JSON str 
should parse it just one time
                 Key: SPARK-47670
                 URL: https://issues.apache.org/jira/browse/SPARK-47670
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.5.2
            Reporter: Zheng Shao


For a query like the following:

{{SELECT}}
{{  GET_JSON_OBJECT(json_col, '$.a.b'),}}
{{  GET_JSON_OBJECT(json_col, '$.a.c')}}
{{FROM t}}

SparkSQL would generate a plan that parse the json_col twice.

Ideally, SparkSQL should only parse the `json_col` once.  The optimizer should 
find out the common JSON parsing, and modify the plan to parse the JSON once, 
get the result out, and flatten it back.

An alternative way to support this is the ":" notation (JSON Path) as in other 
systems where the query optimizer will automatically share a single JSON 
parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to