Re: Spark fails after 6000s because of akka

2015-12-20 Thread Alexander Pivovarov
Documentation says that this setting is used to disable Akka transport failure detector. Why magic number 6000s is used then? It should be maximum possible number instead of 6000s to disable heartbeat Using magic numbers like 1 hour and 40 min creates issues which are difficult to debug. Most

Re: [Spark SQL] SQLContext getOrCreate incorrect behaviour

2015-12-20 Thread Yin Huai
Hi Jerry, Looks like https://issues.apache.org/jira/browse/SPARK-11739 is for the issue you described. It has been fixed in 1.6. With this change, when you call SQLContext.getOrCreate(sc2), we will first check if sc has been stopped. If so, we will create a new SQLContext using sc2. Thanks, Yin

[Spark SQL] SQLContext getOrCreate incorrect behaviour

2015-12-20 Thread Jerry Lam
Hi Spark developers, I found that SQLContext.getOrCreate(sc: SparkContext) does not behave correctly when a different spark context is provided. ``` val sc = new SparkContext val sqlContext =SQLContext.getOrCreate(sc) sc.stop ... val sc2 = new SparkContext val sqlContext2 =

Re: Spark fails after 6000s because of akka

2015-12-20 Thread Josh Rosen
Would you mind copying this information into a JIRA ticket to make it easier to discover / track? Thanks! On Sun, Dec 20, 2015 at 11:35 AM Alexander Pivovarov wrote: > Usually Spark EMR job fails with the following exception in 1 hour 40 min > - Job cancelled because

Spark fails after 6000s because of akka

2015-12-20 Thread Alexander Pivovarov
I run Spark 1.5.2 on YARN (EMR) I noticed that my long running jobs always failed after 1h 40 min (6000s) with the exceptions below. Then I found that Spark has spark.akka.heartbeat.pauses=6000s by default I changed the settings to the following and it solve my issue.

Re: Spark fails after 6000s because of akka

2015-12-20 Thread Alexander Pivovarov
Or this message Exception in thread "main" org.apache.spark.SparkException: Job cancelled because SparkContext was shut down at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:703) at

Re: Spark fails after 6000s because of akka

2015-12-20 Thread Alexander Pivovarov
it can also fail with the following message Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 133 in stage 33.1 failed 4 times, most recent failure: Lost task 133.3 in stage 33.1 (TID 172737, ip-10-0-25-2.ec2.internal): java.io.IOException: Failed

Re: Spark fails after 6000s because of akka

2015-12-20 Thread Alexander Pivovarov
Usually Spark EMR job fails with the following exception in 1 hour 40 min - Job cancelled because SparkContext was shut down java.util.concurrent.RejectedExecutionException: Task scala.concurrent.impl.CallbackRunnable@2d602a14 rejected from