Hi, I tried to load avro files in hdfs but keep getting NPE. I am using AvroKeyValueInputFormat inside newAPIHadoopFile method. Anyone have any clue? Here is stack trace
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 in stage 0.0 failed 4 times, most recent failure: Lost task 4.3 in stage 0.0 (TID 11, xyz.abc.com): java.lang.NullPointerException at org.apache.avro.Schema.getAliases(Schema.java:1415) at org.apache.avro.Schema.getAliases(Schema.java:1429) at org.apache.avro.Schema.applyAliases(Schema.java:1340) at org.apache.avro.generic.GenericDatumReader.getResolver(GenericDatumReader.java:125) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:140) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) at org.apache.avro.mapreduce.AvroRecordReaderBase.nextKeyValue(AvroRecordReaderBase.java:118) at org.apache.avro.mapreduce.AvroKeyValueRecordReader.nextKeyValue(AvroKeyValueRecordReader.java:62) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:143) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1626) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1099) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1099) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1767) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1767) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Thanks, Jack Jinyuan (Jack) Zhou