Ted Yu created SPARK-34017: ------------------------------ Summary: Pass json column information via pruneColumns() Key: SPARK-34017 URL: https://issues.apache.org/jira/browse/SPARK-34017 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.1 Reporter: Ted Yu
Currently PushDownUtils#pruneColumns only passes root fields to SupportsPushDownRequiredColumns implementation(s). 2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema projection List(id#33, address#34, phone#36, get_json_object(phone#36, $.code) AS get_json_object(phone, $.code)#37) 2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema StructType(StructField(id,IntegerType,false), StructField(address,StringType,true), StructField(phone,StringType,true)) The first line shows projections and the second line shows the pruned schema. We can see that get_json_object(phone#36, $.code) is filtered. This expression retrieves field 'code' from phone json column. We should allow json column information to be passed via pruneColumns(). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org