Thanks for responding. I tried using the newAPIHadoopFile method and got an IO Exception with the message "Not a data file".
If anyone has an example of this working I'd appreciate your input or examples. What I entered at the repl and what I got back are below: val myAvroSequenceFile = sc.newAPIHadoopFile("hdfs://<my url", classOf[AvroKeyInputFormat[GenericRecord]], classOf[AvroKey[GenericRecord]], classOf[NullWritable]) scala> myAvroSequenceFile.first() 14/07/18 17:02:38 INFO FileInputFormat: Total input paths to process : 1 14/07/18 17:02:38 INFO SparkContext: Starting job: first at <console>:19 14/07/18 17:02:38 INFO DAGScheduler: Got job 0 (first at <console>:19) with 1 output partitions (allowLocal=true) 14/07/18 17:02:38 INFO DAGScheduler: Final stage: Stage 0(first at <console>:19) 14/07/18 17:02:38 INFO DAGScheduler: Parents of final stage: List() 14/07/18 17:02:38 INFO DAGScheduler: Missing parents: List() 14/07/18 17:02:38 INFO DAGScheduler: Computing the requested partition locally 14/07/18 17:02:38 INFO NewHadoopRDD: Input split: hdfs:<my url> 14/07/18 17:02:38 WARN AvroKeyInputFormat: Reader schema was not set. Use AvroJob.setInputKeySchema() if desired. 14/07/18 17:02:38 INFO AvroKeyInputFormat: Using a reader schema equal to the writer schema. 14/07/18 17:02:38 INFO DAGScheduler: Failed to run first at <console>:19 org.apache.spark.SparkDriverExecutionException: Execution error at org.apache.spark.scheduler.DAGScheduler.runLocallyWithinThread(DAGScheduler.scala:585) at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:563) Caused by: java.io.IOException: Not a data file. at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105) at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97) at org.apache.avro.mapreduce.AvroRecordReaderBase.createAvroFileReader(AvroRecordReaderBase.java:180) at org.apache.avro.mapreduce.AvroRecordReaderBase.initialize(AvroRecordReaderBase.java:90) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:114) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:100) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:62) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:261) at org.apache.spark.rdd.RDD.iterator(RDD.scala:228) at org.apache.spark.scheduler.DAGScheduler.runLocallyWithinThread(DAGScheduler.scala:578) ... 1 more -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-when-reading-Avro-Sequence-files-tp10201p10204.html Sent from the Apache Spark User List mailing list archive at Nabble.com.