[ 
https://issues.apache.org/jira/browse/SPARK-36983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mike updated SPARK-36983:
-------------------------
    Attachment: file2.parquet
                file1.parquet

> ignoreCorruptFiles does not work when schema change from int to string
> ----------------------------------------------------------------------
>
>                 Key: SPARK-36983
>                 URL: https://issues.apache.org/jira/browse/SPARK-36983
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.8, 3.1.2
>            Reporter: mike
>            Priority: Major
>         Attachments: file1.parquet, file2.parquet
>
>
> Precondition:
> In folder A having two parquet files
>  * File 1: have some columns and one of them is column X with data type Int
>  * File 2: Same schema with File 1 except column X  having data type String
> Read file 1 to get schema of file 1.
> Read folder A with schema of file 1.
> Expected: Read successfully, file 2 will be ignored as the data type of 
> column X changed to string.
> Actual: File 2 seems to be not ignored and get error:
>  `WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2) (192.168.1.78 
> executor driver): java.lang.UnsupportedOperationException: 
> org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary
>  WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2) (192.168.1.78 
> executor driver): java.lang.UnsupportedOperationException: 
> org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary
>  at org.apache.parquet.column.Dictionary.decodeToInt(Dictionary.java:45)`
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to