Re: How to debug: Runs locally but not on cluster
I've isolated this to a memory issue but I don't know what parameter I need to tweak. If I sample my samples RDD with 35% of the data, everything runs to completion, with 35%, it fails. In standalone mode, I can run on the full RDD without any problems. // works val samples = sc.textFile(s3n://geonames).sample(false,0.35) // 64MB, 2849439 Lines // fails val samples = sc.textFile(s3n://geonames).sample(false,0.4) // 64MB, 2849439 Lines Any ideas? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-debug-Runs-locally-but-not-on-cluster-tp12081p12091.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
How to debug: Runs locally but not on cluster
Hi all, I have an issue where I'm able to run my code in standalone mode but not on my cluster. I've isolated it to a few things but am at a lost at how to debug this. Below is the code. Any suggestions would be much appreciated Thanks! 1) RDD size is causing the problem. The code below as is fails but if I swap smallSample for samples, the code runs end to end on both cluster and standalone. 2) The error I get is: rg.apache.spark.SparkException: Job aborted due to stage failure: Task 3.0:1 failed 4 times, most recent failure: TID 12 on host ip-10-251-14-74.us-west-2.compute.internal failed for unknown reason Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028) Code: val smallSample = sc.parallelize(Array(foo word, bar word, baz word)) val samples = sc.textFile(s3n://geonames) // 64MB, 2849439 Lines of short strings val counts = new collection.mutable.HashMap[String, Int].withDefaultValue(0) samples.toArray.foreach(counts(_) += 1) val result = samples.map( l = (l, counts.get(l)) ) result.count Settings (with or without Kryo doesn't matter): export SPARK_JAVA_OPTS=-Xms5g -Xmx10g -XX:MaxPermSize=10g export SPARK_MEM=10g spark.akka.frameSize 40 #spark.serializer org.apache.spark.serializer.KryoSerializer #spark.kryoserializer.buffer.mb 1000 spark.executor.memory 58315m spark.executor.extraLibraryPath /root/ephemeral-hdfs/lib/native/ spark.executor.extraClassPath /root/ephemeral-hdfs/conf -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-debug-Runs-locally-but-not-on-cluster-tp12081.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org