[jira] [Updated] (SPARK-39806) Queries accessing METADATA struct crash on partitioned tables

Ala Luszczak (Jira) Mon, 18 Jul 2022 01:00:04 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-39806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ala Luszczak updated SPARK-39806:
---------------------------------
    Description: 
There is a problem with a projection we use in `FileScanRDD` to join the 
metadata row to the row produced by the reader.

https://github.com/apache/spark/blob/e4ca8424474e571d8e137388fe5d54732b68c2f3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala#L128-L133

The issue is that the projection omits partition columns. As a result, the 
expressions down the line return a malformed row. The errors crash the query, 
but the exact message can vary (for example: failed assertion on number of 
fields in the row, accessing field of incorrect type).

This defect affects only readers producing rows (as opposed to batches), and 
only data sets using dynamic partitioning.

  was:
There is a problem with a projection we use in `FileScanRDD` to join the 
metadata row to the row produced by the reader.

https://github.com/apache/spark/blob/e4ca8424474e571d8e137388fe5d54732b68c2f3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala#L128-L133

The issue is that the projection omits partition columns. As a result, the 
expressions down the line return a malformed row. The errors crash the query, 
but the exact message can vary (for example: failed assertion on number of 
fields in the row, accessing field of incorrect type).

This defect affects only readers producing rows, and only data sets using 
dynamic partitioning.


> Queries accessing METADATA struct crash on partitioned tables
> -------------------------------------------------------------
>
>                 Key: SPARK-39806
>                 URL: https://issues.apache.org/jira/browse/SPARK-39806
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Ala Luszczak
>            Priority: Major
>
> There is a problem with a projection we use in `FileScanRDD` to join the 
> metadata row to the row produced by the reader.
> https://github.com/apache/spark/blob/e4ca8424474e571d8e137388fe5d54732b68c2f3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala#L128-L133
> The issue is that the projection omits partition columns. As a result, the 
> expressions down the line return a malformed row. The errors crash the query, 
> but the exact message can vary (for example: failed assertion on number of 
> fields in the row, accessing field of incorrect type).
> This defect affects only readers producing rows (as opposed to batches), and 
> only data sets using dynamic partitioning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-39806) Queries accessing METADATA struct crash on partitioned tables

Reply via email to