[ https://issues.apache.org/jira/browse/SPARK-34017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259423#comment-17259423 ]
Ted Yu commented on SPARK-34017: -------------------------------- For PushDownUtils#pruneColumns, I am experimenting with the following: {code} case r: SupportsPushDownRequiredColumns if SQLConf.get.nestedSchemaPruningEnabled => val JSONCapture = "get_json_object\\((.*), *(.*)\\)".r var jsonRootFields : ArrayBuffer[RootField] = ArrayBuffer() projects.map{ _.map{ f => f.toString match { case JSONCapture(column, field) => jsonRootFields += RootField(StructField(column, f.dataType, f.nullable), derivedFromAtt = false, prunedIfAnyChildAccessed = true) case _ => logDebug("else " + f) }}} val rootFields = SchemaPruning.identifyRootFields(projects, filters) ++ jsonRootFields {code} > Pass json column information via pruneColumns() > ----------------------------------------------- > > Key: SPARK-34017 > URL: https://issues.apache.org/jira/browse/SPARK-34017 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.1 > Reporter: Ted Yu > Priority: Major > > Currently PushDownUtils#pruneColumns only passes root fields to > SupportsPushDownRequiredColumns implementation(s). > {code} > 2021-01-05 19:36:07,437 (Time-limited test) [DEBUG - > org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema > projection List(id#33, address#34, phone#36, get_json_object(phone#36, > $.code) AS get_json_object(phone, $.code)#37) > 2021-01-05 19:36:07,438 (Time-limited test) [DEBUG - > org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] nested schema > StructType(StructField(id,IntegerType,false), > StructField(address,StringType,true), StructField(phone,StringType,true)) > {code} > The first line shows projections and the second line shows the pruned schema. > We can see that get_json_object(phone#36, $.code) is filtered. This > expression retrieves field 'code' from phone json column. > We should allow json column information to be passed via pruneColumns(). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org