Quick google search of that exception says this occurs when there is an
error in the initialization of static methods. Could be some issue related
to how dissection is defined. Maybe try putting the function in a different
static class that is unrelated to the Main class, which may have other
static initialization stuff? Its hard to say much more without any more
context.
TD
On Tue, Jul 15, 2014 at 2:17 AM, Gianluca Privitera
gianluca.privite...@studio.unibo.it wrote:
Hi,
I’ve got a problem with Spark Streaming and tshark.
While I’m running locally I have no problems with this code, but when I
run it on a EC2 cluster I get the exception shown just under the code.
def dissection(s: String): Seq[String] = {
try {
Process(hadoop command to create ./localcopy.tmp).! // calls
hadoop to copy a file from s3 locally
val pb = Process(“tshark … localcopy.tmp”) // calls tshark to
transform the s3 file into sequence of strings
var returnValue = pb.lines_!.toSeq
return returnValue
} catch {
case e: Exception =
System.err.println(“ERROR)
return new MutableList[String]()
}
}
(line 2051 points to the function “dissection”)
WARN scheduler.TaskSetManager: Loss was due to
java.lang.ExceptionInInitializerError
java.lang.ExceptionInInitializerError
at Main$$anonfun$11.apply(Main.scala:2051)
at Main$$anonfun$11.apply(Main.scala:2051)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Has anyone got an idea why that may happen? I’m pretty sure that the
hadoop call works perfectly.
Thanks
Gianluca