After checking the codes, I think there are few issues regarding this
ignoreCorruptFiles config, so you can't actually use it with Parquet files
now.

I opened a JIRA https://issues.apache.org/jira/browse/SPARK-19082 and also
submitted a PR for it.


khyati wrote
> Hi Reynold Xin,
> 
> In spark 2.1.0,
> I tried setting spark.sql.files.ignoreCorruptFiles = true by using
> commands,
> 
> val sqlContext =new org.apache.spark.sql.hive.HiveContext(sc)
> 
> sqlContext.setConf("spark.sql.files.ignoreCorruptFiles","true") /
> sqlContext.sql("set spark.sql.files.ignoreCorruptFiles=true")
> 
> but still getting error while reading parquet files using 
> val newDataDF =
> sqlContext.read.parquet("/data/tempparquetdata/corruptblock.0","/data/tempparquetdata/data1.parquet")
> 
> Error: ERROR executor.Executor: Exception in task 0.0 in stage 4.0 (TID 4)
> java.io.IOException: Could not read footer: java.lang.RuntimeException:
> hdfs://192.168.1.53:9000/data/tempparquetdata/corruptblock.0 is not a
> Parquet file. expected magic number at tail [80, 65, 82, 49] but found
> [65, 82, 49, 10]
>       at
> org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:248)
> 
> 
> Please let me know if I am missing anything.





-----
Liang-Chi Hsieh | @viirya 
Spark Technology Center 
http://www.spark.tc/ 
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Skip-Corrupted-Parquet-blocks-footer-tp20418p20466.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to