I am trying to process events from a flume avro sink, but i keep getting this
same error. I am just running it locally using flumes avro-client. With the
following commands to start the job and client. It seems like it should be
a configuration problems since its a NoSuchMethodError, but everything is
there.
Job command:
spark-submit --master local[2] --class com.streaming.SparkFlume
./target/scala-2.10/streaming.jar localhost
Client command:
flume-ng avro-client --conf $FLUME_BASE/conf -H localhost -p -F
/etc/passwd
The error:
15/03/13 16:55:10 INFO ReceiverTracker: Stream 0 received 0 blocks
15/03/13 16:55:10 INFO JobScheduler: Added jobs for time 142628011 ms
15/03/13 16:55:10 INFO JobScheduler: Starting job streaming job
142628011 ms.0 from job set of time 142628011 ms
15/03/13 16:55:10 INFO DefaultExecutionContext: Starting job:
CallSite(getCallSite at
DStream.scala:294,org.apache.spark.SparkContext.getCallSite(SparkContext.scala:1077)
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:294)
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:288)
scala.Option.orElse(Option.scala:257)
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:285)
org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:38)
org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:115)
org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:115)
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:115)
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$2.apply(JobGenerator.scala:221)
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$2.apply(JobGenerator.scala:221)
scala.util.Try$.apply(Try.scala:161)
org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:221)
org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:165))
15/03/13 16:55:10 INFO DefaultExecutionContext: Job finished:
CallSite(getCallSite at
DStream.scala:294,org.apache.spark.SparkContext.getCallSite(SparkContext.scala:1077)
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:294)
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:288)
scala.Option.orElse(Option.scala:257)
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:285)
org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:38)
org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:115)
org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:115)
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:115)
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$2.apply(JobGenerator.scala:221)
org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$2.apply(JobGenerator.scala:221)
scala.util.Try$.apply(Try.scala:161)
org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:221)
org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:165)),
took 3.9692E-5 s
15/03/13 16:55:10 INFO JobScheduler: Finished job streaming job
142628011 ms.0 from job set of time 142628011 ms
15/03/13 16:55:10 INFO JobScheduler: Total delay: 0.008 s for time
142628011 ms (execution: 0.003 s)
15/03/13 16:55:10 INFO FilteredRDD: Removing RDD 12 from persistence list
15/03/13 16:55:10 INFO BlockManager: Removing RDD 12
15/03/13 16:55:10 INFO MappedRDD: Removing RDD 11 from persistence list
15/03/13 16:55:10 INFO BlockManager: Removing RDD 11
15/03/13 16:55:10 INFO BlockRDD: Removing RDD 10 from persistence list
15/03/13 16:55:10 INFO BlockManager: Removing RDD 10
15/03/13 16:55:10 INFO FlumeInputDStream: Removing blocks of RDD
BlockRDD[10] at createStream at SparkFlume.scala:45 of time 142628011 ms
15/03/13 16:55:11 INFO