[ https://issues.apache.org/jira/browse/SPARK-36983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
mike updated SPARK-36983: ------------------------- Attachment: file2.parquet file1.parquet > ignoreCorruptFiles does not work when schema change from int to string > ---------------------------------------------------------------------- > > Key: SPARK-36983 > URL: https://issues.apache.org/jira/browse/SPARK-36983 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.4.8, 3.1.2 > Reporter: mike > Priority: Major > Attachments: file1.parquet, file2.parquet > > > Precondition: > In folder A having two parquet files > * File 1: have some columns and one of them is column X with data type Int > * File 2: Same schema with File 1 except column X having data type String > Read file 1 to get schema of file 1. > Read folder A with schema of file 1. > Expected: Read successfully, file 2 will be ignored as the data type of > column X changed to string. > Actual: File 2 seems to be not ignored and get error: > `WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2) (192.168.1.78 > executor driver): java.lang.UnsupportedOperationException: > org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary > WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2) (192.168.1.78 > executor driver): java.lang.UnsupportedOperationException: > org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary > at org.apache.parquet.column.Dictionary.decodeToInt(Dictionary.java:45)` > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org