Ted Yu created SPARK-34017:
------------------------------

             Summary: Pass json column information via pruneColumns()
                 Key: SPARK-34017
                 URL: https://issues.apache.org/jira/browse/SPARK-34017
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.0.1
            Reporter: Ted Yu



Currently PushDownUtils#pruneColumns only passes root fields to 
SupportsPushDownRequiredColumns implementation(s).

2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
projection List(id#33, address#34, phone#36, get_json_object(phone#36, $.code) 
AS get_json_object(phone, $.code)#37)
2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema 
StructType(StructField(id,IntegerType,false), 
StructField(address,StringType,true), StructField(phone,StringType,true))

The first line shows projections and the second line shows the pruned schema.

We can see that get_json_object(phone#36, $.code) is filtered. This expression 
retrieves field 'code' from phone json column.

We should allow json column information to be passed via pruneColumns().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to