[ https://issues.apache.org/jira/browse/SPARK-34638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877595#comment-17877595 ]
Jiri Humpolicek edited comment on SPARK-34638 at 8/29/24 6:03 AM: ------------------------------------------------------------------ [~viirya] Do you think it would be possible to do that? I think it will be great feature when spark reads only necessary fields from query in general way. In case of rich nested structures it could save huge amount of resources. I found unresolved improvement for this more general case from last year https://issues.apache.org/jira/browse/SPARK-42879 . was (Author: yuryn): [~viirya] Do you think it would be possible to do that? I think it will be great feature when spark reads only necessary fields from query in general way. In case of rich nested structures it could safe huge amount of resources. I found unresolved improvement for this more general case from last year https://issues.apache.org/jira/browse/SPARK-42879 . > Spark SQL reads unnecessary nested fields (another type of pruning case) > ------------------------------------------------------------------------ > > Key: SPARK-34638 > URL: https://issues.apache.org/jira/browse/SPARK-34638 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.1.1 > Reporter: Jiri Humpolicek > Assignee: L. C. Hsieh > Priority: Major > Fix For: 3.2.0 > > > Based on this [SPARK-29721|https://issues.apache.org/jira/browse/SPARK-29721] > I found another nested fields pruning case. > Example: > 1) Loading data > {code:scala} > val jsonStr = """{ > "items": [ > {"itemId": 1, "itemData": "a"}, > {"itemId": 2, "itemData": "b"} > ] > }""" > val df = spark.read.json(Seq(jsonStr).toDS) > df.write.format("parquet").mode("overwrite").saveAsTable("persisted") > {code} > 2) read query with explain > {code:scala} > val read = spark.table("persisted") > spark.conf.set("spark.sql.optimizer.nestedSchemaPruning.enabled", true) > read.select(explode($"items").as('item)).select($"item.itemId").explain(true) > // ReadSchema: struct<items:array<struct<itemData:string,itemId:bigint>>> > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org