[ https://issues.apache.org/jira/browse/SPARK-19082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-19082. --------------------------------- Resolution: Fixed Assignee: Liang-Chi Hsieh Fix Version/s: 2.2.0 2.1.1 > The config ignoreCorruptFiles doesn't work for Parquet > ------------------------------------------------------ > > Key: SPARK-19082 > URL: https://issues.apache.org/jira/browse/SPARK-19082 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Liang-Chi Hsieh > Assignee: Liang-Chi Hsieh > Fix For: 2.1.1, 2.2.0 > > > We have a config {{spark.sql.files.ignoreCorruptFiles}} which can be used to > ignore corrupt files when reading files in SQL. Currently the > {{ignoreCorruptFiles}} config has two issues and can't work for Parquet: > 1. We only ignore corrupt files in {{FileScanRDD}} . Actually, we begin to > read those files as early as inferring data schema from the files. For > corrupt files, we can't read the schema and fail the program. A related issue > reported at > http://apache-spark-developers-list.1001551.n3.nabble.com/Skip-Corrupted-Parquet-blocks-footer-tc20418.html > 2. In {{FileScanRDD}}, we assume that we only begin to read the files when > starting to consume the iterator. However, it is possibly the files are read > before that. In this case, {{ignoreCorruptFiles}} config doesn't work too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org