[ https://issues.apache.org/jira/browse/SPARK-39806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ala Luszczak updated SPARK-39806: --------------------------------- Description: There is a problem with a projection we use in `FileScanRDD` to join the metadata row to the row produced by the reader. https://github.com/apache/spark/blob/e4ca8424474e571d8e137388fe5d54732b68c2f3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala#L128-L133 The issue is that the projection omits partition columns. As a result, the expressions down the line return a malformed row. The errors crash the query, but the exact message can vary (for example: failed assertion on number of fields in the row, accessing field of incorrect type). This defect affects only readers producing rows (as opposed to batches), and only data sets using dynamic partitioning. was: There is a problem with a projection we use in `FileScanRDD` to join the metadata row to the row produced by the reader. https://github.com/apache/spark/blob/e4ca8424474e571d8e137388fe5d54732b68c2f3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala#L128-L133 The issue is that the projection omits partition columns. As a result, the expressions down the line return a malformed row. The errors crash the query, but the exact message can vary (for example: failed assertion on number of fields in the row, accessing field of incorrect type). This defect affects only readers producing rows, and only data sets using dynamic partitioning. > Queries accessing METADATA struct crash on partitioned tables > ------------------------------------------------------------- > > Key: SPARK-39806 > URL: https://issues.apache.org/jira/browse/SPARK-39806 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.3.0 > Reporter: Ala Luszczak > Priority: Major > > There is a problem with a projection we use in `FileScanRDD` to join the > metadata row to the row produced by the reader. > https://github.com/apache/spark/blob/e4ca8424474e571d8e137388fe5d54732b68c2f3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala#L128-L133 > The issue is that the projection omits partition columns. As a result, the > expressions down the line return a malformed row. The errors crash the query, > but the exact message can vary (for example: failed assertion on number of > fields in the row, accessing field of incorrect type). > This defect affects only readers producing rows (as opposed to batches), and > only data sets using dynamic partitioning. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org