[jira] [Updated] (SPARK-46056) Vectorized parquet reader throws NPE when reading files with DecimalType default values

Cosmin Dumitru (Jira) Wed, 22 Nov 2023 08:53:08 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-46056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Cosmin Dumitru updated SPARK-46056:
-----------------------------------
    Description: 
The scenario is a bit more complicated than what the title says but it's not 
that far fetched. 
 # Write a parquet file with one column
 # Evolve the schema and add a new column with DecimalType wide enough that it 
doesn't fit in a long and has a default value. 
 # Try to read the file with the new schema
 # NPE 

The issue lies in how the column vector stores DecimalTypes. It incorrectly 
assumes that they fit in a long and try to write it to associated long array.

https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java#L724

  was:
The scenario is a bit more complicated than what the title says but it's not 
that far fetched. 
 # Write a parquet file with one column
 # Evolve the schema and add a new column with DecimalType wide enough that it 
doesn't fit in a long and has a default value. 
 # Try to read the file with the new schema
 # NPE 

The issue lies in how the column vector stores DecimalTypes. It incorrectly 
assumes that they fit in a long and try to write it to associated long array.

 


> Vectorized parquet reader throws NPE when reading files with DecimalType 
> default values
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-46056
>                 URL: https://issues.apache.org/jira/browse/SPARK-46056
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.4.0, 3.5.0
>            Reporter: Cosmin Dumitru
>            Priority: Major
>
> The scenario is a bit more complicated than what the title says but it's not 
> that far fetched. 
>  # Write a parquet file with one column
>  # Evolve the schema and add a new column with DecimalType wide enough that 
> it doesn't fit in a long and has a default value. 
>  # Try to read the file with the new schema
>  # NPE 
> The issue lies in how the column vector stores DecimalTypes. It incorrectly 
> assumes that they fit in a long and try to write it to associated long array.
> https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java#L724



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46056) Vectorized parquet reader throws NPE when reading files with DecimalType default values

Reply via email to