Looks like an OOM issue? Have you tried persisting your RDDs to allow disk writes?
I've seen a lot of similar crashes in a Spark app that reads from HDFS and does joins. I.e. I've seen "java.io.IOException: Filesystem closed," "Executor lost," "FetchFailed," etc etc with non-deterministic crashes. I've tried persisting RDDs, tuning other params, and verifying that the Executor JVMs don't come close to their max allocated memory during operation. Looking through user@ tonight, there are a ton of email threads with similar crashes and no answers. It looks like a lot of people are struggling with OOMs. Could one of the Spark committers please comment on this thread, or one of the other unanswered threads with similar crashes? Is this simply how Spark behaves if Executors OOM? What can the user do other than increase memory or reduce RDD size? (And how can one deduce how much of either is needed?) One general workaround for OOMs could be to programmatically break the job input (i.e. from HDFS, input from #parallelize() ) into chunks, and only create/process RDDs related to one chunk at a time. However, this approach has the limitations of Spark Streaming and no formal library support. What might be nice is that if tasks fail, Spark could try to re-partition in order to avoid OOMs. On Fri, Oct 3, 2014 at 2:55 AM, jamborta <jambo...@gmail.com> wrote: > I have two nodes with 96G ram 16 cores, my setup is as follows: > > conf = (SparkConf() > .setMaster("yarn-cluster") > .set("spark.executor.memory", "30G") > .set("spark.cores.max", 32) > .set("spark.executor.instances", 2) > .set("spark.executor.cores", 8) > .set("spark.akka.timeout", 10000) > .set("spark.akka.askTimeout", 100) > .set("spark.akka.frameSize", 500) > .set("spark.cleaner.ttl", 86400) > .set("spark.tast.maxFailures", 16) > .set("spark.worker.timeout", 150) > > thanks a lot, > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Any-issues-with-repartition-tp13462p15674.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org