Jiri Humpolicek created SPARK-34638: ---------------------------------------
Summary: Spark SQL reads unnecessary nested fields (another type of pruning case) Key: SPARK-34638 URL: https://issues.apache.org/jira/browse/SPARK-34638 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.1 Reporter: Jiri Humpolicek Based on this [SPARK-29721|https://issues.apache.org/jira/browse/SPARK-29721] I found another nested fields pruning case. Example: 1) Loading data {code:scala} val jsonStr = """{ "items": [ {"itemId": 1, "itemData": "a"}, {"itemId": 2, "itemData": "b"} ] }""" val df = spark.read.json(Seq(jsonStr).toDS) df.write.format("parquet").mode("overwrite").saveAsTable("persisted") {code} 2) read query with explain {code:scala} val read = spark.table("persisted") spark.conf.set("spark.sql.optimizer.nestedSchemaPruning.enabled", true) read.select(explode($"items").as('item)).select($"item.itemId").explain(true) // ReadSchema: struct<items:array<struct<itemData:string,itemId:bigint>>> {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org