Error in run multiple unit test that extends DataFrameSuiteBase

2016-09-23 Thread Jinyuan Zhou
After I created two test case  that FlatSpec with DataFrameSuiteBase. But I
got errors when do sbt test. I was able to run each of them separately. My
test cases does use sqlContext to read files. Here is the exception stack.
Judging from the exception, I may need to unregister RpcEndpoint after each
test run.
info] Exception encountered when attempting to run a suite with class name:
 MyTestSuit *** ABORTED ***
[info]   java.lang.IllegalArgumentException: There is already an
RpcEndpoint called LocalSchedulerBackendEndpoint
[info]   at
org.apache.spark.rpc.netty.Dispatcher.registerRpcEndpoint(Dispatcher.scala:66)
[info]   at
org.apache.spark.rpc.netty.NettyRpcEnv.setupEndpoint(NettyRpcEnv.scala:129)
[info]   at
org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:127)
[info]   at
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
[info]   at org.apache.spark.SparkContext.(SparkContext.scala:500)


NPE in using AvroKeyValueInputFormat for newAPIHadoopFile

2015-12-15 Thread Jinyuan Zhou
Hi, I tried to load avro files in hdfs but keep getting NPE.
 I am using AvroKeyValueInputFormat inside newAPIHadoopFile method. Anyone
have any clue? Here is stack trace

Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 4 in stage 0.0 failed 4 times, most recent failure:
Lost task 4.3 in stage 0.0 (TID 11, xyz.abc.com):
java.lang.NullPointerException

at org.apache.avro.Schema.getAliases(Schema.java:1415)

at org.apache.avro.Schema.getAliases(Schema.java:1429)

at org.apache.avro.Schema.applyAliases(Schema.java:1340)

at
org.apache.avro.generic.GenericDatumReader.getResolver(GenericDatumReader.java:125)

at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:140)

at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)

at
org.apache.avro.mapreduce.AvroRecordReaderBase.nextKeyValue(AvroRecordReaderBase.java:118)

at
org.apache.avro.mapreduce.AvroKeyValueRecordReader.nextKeyValue(AvroKeyValueRecordReader.java:62)

at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:143)

at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)

at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1626)

at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1099)

at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1099)

at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1767)

at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1767)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)

at org.apache.spark.scheduler.Task.run(Task.scala:70)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:744)


Thanks,

Jack
Jinyuan (Jack) Zhou