Re: How to skip corrupted avro files

2015-05-05 Thread Imran Rashid
You might be interested in https://issues.apache.org/jira/browse/SPARK-6593 and the discussion around the PRs. This is probably more complicated than what you are looking for, but you could copy the code for HadoopReliableRDD in the PR into your own code and use it, without having to wait for the

Re: How to skip corrupted avro files

2015-05-05 Thread Shing Hing Man
Thanks for the info ! Shing On Tuesday, 5 May 2015, 15:11, Imran Rashid iras...@cloudera.com wrote: You might be interested in https://issues.apache.org/jira/browse/SPARK-6593 and the discussion around the PRs. This is probably more complicated than what you are looking for, but

How to skip corrupted avro files

2015-05-03 Thread Shing Hing Man
Hi, I am using Spark 1.3.1 to read a directory of about 2000 avro files. The avro files are from a third party and a few of them are corrupted. val path = {myDirecotry of avro files} val sparkConf = new SparkConf().setAppName(avroDemo).setMaster(local) val sc = new SparkContext(sparkConf)