[ https://issues.apache.org/jira/browse/SPARK-34017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ted Yu updated SPARK-34017: --------------------------- Description: Currently PushDownUtils#pruneColumns only passes root fields to SupportsPushDownRequiredColumns implementation(s). {code} 2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema projection List(id#33, address#34, phone#36, get_json_object(phone#36, $.code) AS get_json_object(phone, $.code)#37) 2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema StructType(StructField(id,IntegerType,false), StructField(address,StringType,true), StructField(phone,StringType,true)) {code} The first line shows projections and the second line shows the pruned schema. We can see that get_json_object(phone#36, $.code) is filtered. This expression retrieves field 'code' from phone json column. We should allow json column information to be passed via pruneColumns(). was: Currently PushDownUtils#pruneColumns only passes root fields to SupportsPushDownRequiredColumns implementation(s). 2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema projection List(id#33, address#34, phone#36, get_json_object(phone#36, $.code) AS get_json_object(phone, $.code)#37) 2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema StructType(StructField(id,IntegerType,false), StructField(address,StringType,true), StructField(phone,StringType,true)) The first line shows projections and the second line shows the pruned schema. We can see that get_json_object(phone#36, $.code) is filtered. This expression retrieves field 'code' from phone json column. We should allow json column information to be passed via pruneColumns(). > Pass json column information via pruneColumns() > ----------------------------------------------- > > Key: SPARK-34017 > URL: https://issues.apache.org/jira/browse/SPARK-34017 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.1 > Reporter: Ted Yu > Priority: Major > > Currently PushDownUtils#pruneColumns only passes root fields to > SupportsPushDownRequiredColumns implementation(s). > {code} > 2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - > org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema > projection List(id#33, address#34, phone#36, get_json_object(phone#36, > $.code) AS get_json_object(phone, $.code)#37) > 2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - > org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema > StructType(StructField(id,IntegerType,false), > StructField(address,StringType,true), StructField(phone,StringType,true)) > {code} > The first line shows projections and the second line shows the pruned schema. > We can see that get_json_object(phone#36, $.code) is filtered. This > expression retrieves field 'code' from phone json column. > We should allow json column information to be passed via pruneColumns(). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org