[ https://issues.apache.org/jira/browse/SPARK-46056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cosmin Dumitru updated SPARK-46056: ----------------------------------- Description: The scenario is a bit more complicated than what the title says but it's not that far fetched. # Write a parquet file with one column # Evolve the schema and add a new column with DecimalType wide enough that it doesn't fit in a long and has a default value. # Try to read the file with the new schema # NPE The issue lies in how the column vector stores DecimalTypes. It incorrectly assumes that they fit in a long and try to write it to associated long array. https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java#L724 was: The scenario is a bit more complicated than what the title says but it's not that far fetched. # Write a parquet file with one column # Evolve the schema and add a new column with DecimalType wide enough that it doesn't fit in a long and has a default value. # Try to read the file with the new schema # NPE The issue lies in how the column vector stores DecimalTypes. It incorrectly assumes that they fit in a long and try to write it to associated long array. > Vectorized parquet reader throws NPE when reading files with DecimalType > default values > --------------------------------------------------------------------------------------- > > Key: SPARK-46056 > URL: https://issues.apache.org/jira/browse/SPARK-46056 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.4.0, 3.5.0 > Reporter: Cosmin Dumitru > Priority: Major > > The scenario is a bit more complicated than what the title says but it's not > that far fetched. > # Write a parquet file with one column > # Evolve the schema and add a new column with DecimalType wide enough that > it doesn't fit in a long and has a default value. > # Try to read the file with the new schema > # NPE > The issue lies in how the column vector stores DecimalTypes. It incorrectly > assumes that they fit in a long and try to write it to associated long array. > https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java#L724 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org