[jira] [Updated] (SPARK-7563) OutputCommitCoordinator.stop() should only be executed in driver

2015-08-03 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7563:
-
Assignee: Josh Rosen
Target Version/s:   (was: 1.3.2)
  Labels:   (was: backport-needed)
   Fix Version/s: 1.3.2

 OutputCommitCoordinator.stop() should only be executed in driver
 

 Key: SPARK-7563
 URL: https://issues.apache.org/jira/browse/SPARK-7563
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
 Environment: Red Hat Enterprise Linux Server release 7.0 (Maipo)
 Spark 1.3.1 Release
Reporter: Hailong Wen
Assignee: Josh Rosen
Priority: Critical
 Fix For: 1.3.2, 1.4.0


 I am from IBM Platform Symphony team and we are integrating Spark 1.3.1 with 
 EGO (a resource management product).
 In EGO we uses fine-grained dynamic allocation policy, and each Executor will 
 exit after its tasks are all done. When testing *spark-shell*, we find that 
 when executor of first job exit, it will stop OutputCommitCoordinator, which 
 result in all future jobs failing. Details are as follows:
 We got the following error in executor when submitting job in *spark-shell* 
 the second time (the first job submission is successful):
 {noformat}
 15/05/11 04:02:31 INFO spark.util.AkkaUtils: Connecting to 
 OutputCommitCoordinator: 
 akka.tcp://sparkDriver@whlspark01:50452/user/OutputCommitCoordinator
 Exception in thread main akka.actor.ActorNotFound: Actor not found for: 
 ActorSelection[Anchor(akka.tcp://sparkDriver@whlspark01:50452/), 
 Path(/user/OutputCommitCoordinator)]
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
 at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
 at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
 at 
 akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
 at 
 scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
 at 
 scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
 at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
 at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89)
 at 
 akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
 at akka.dispatch.Mailbox.run(Mailbox.scala:220)
 at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 {noformat}
 And in driver side, we see a log message telling that the 
 OutputCommitCoordinator is stopped after the first submission:
 {noformat}
 15/05/11 04:01:23 INFO 
 spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: 
 OutputCommitCoordinator stopped!
 {noformat}
 We examine the code of OutputCommitCoordinator, and find that executor will 
 reuse the ref of driver's OutputCommitCoordinatorActor. So when an executor 
 exits, it will eventually call SparkEnv.stop():
 {noformat}
   private[spark] def stop() {
 isStopped = true
 pythonWorkers.foreach { case(key, worker) = worker.stop() }
 Option(httpFileServer).foreach(_.stop())
 mapOutputTracker.stop()
 shuffleManager.stop()
 broadcastManager.stop()
 

[jira] [Updated] (SPARK-7563) OutputCommitCoordinator.stop() should only be executed in driver

2015-06-19 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7563:
-
Target Version/s:   (was: 1.3.2, 1.4.0)

 OutputCommitCoordinator.stop() should only be executed in driver
 

 Key: SPARK-7563
 URL: https://issues.apache.org/jira/browse/SPARK-7563
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
 Environment: Red Hat Enterprise Linux Server release 7.0 (Maipo)
 Spark 1.3.1 Release
Reporter: Hailong Wen
Priority: Critical
  Labels: backport-needed

 I am from IBM Platform Symphony team and we are integrating Spark 1.3.1 with 
 EGO (a resource management product).
 In EGO we uses fine-grained dynamic allocation policy, and each Executor will 
 exit after its tasks are all done. When testing *spark-shell*, we find that 
 when executor of first job exit, it will stop OutputCommitCoordinator, which 
 result in all future jobs failing. Details are as follows:
 We got the following error in executor when submitting job in *spark-shell* 
 the second time (the first job submission is successful):
 {noformat}
 15/05/11 04:02:31 INFO spark.util.AkkaUtils: Connecting to 
 OutputCommitCoordinator: 
 akka.tcp://sparkDriver@whlspark01:50452/user/OutputCommitCoordinator
 Exception in thread main akka.actor.ActorNotFound: Actor not found for: 
 ActorSelection[Anchor(akka.tcp://sparkDriver@whlspark01:50452/), 
 Path(/user/OutputCommitCoordinator)]
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
 at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
 at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
 at 
 akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
 at 
 scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
 at 
 scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
 at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
 at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89)
 at 
 akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
 at akka.dispatch.Mailbox.run(Mailbox.scala:220)
 at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 {noformat}
 And in driver side, we see a log message telling that the 
 OutputCommitCoordinator is stopped after the first submission:
 {noformat}
 15/05/11 04:01:23 INFO 
 spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: 
 OutputCommitCoordinator stopped!
 {noformat}
 We examine the code of OutputCommitCoordinator, and find that executor will 
 reuse the ref of driver's OutputCommitCoordinatorActor. So when an executor 
 exits, it will eventually call SparkEnv.stop():
 {noformat}
   private[spark] def stop() {
 isStopped = true
 pythonWorkers.foreach { case(key, worker) = worker.stop() }
 Option(httpFileServer).foreach(_.stop())
 mapOutputTracker.stop()
 shuffleManager.stop()
 broadcastManager.stop()
 blockManager.stop()
 blockManager.master.stop()
 metricsSystem.stop()
 outputCommitCoordinator.stop()  --- 
 

[jira] [Updated] (SPARK-7563) OutputCommitCoordinator.stop() should only be executed in driver

2015-06-19 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7563:
-
Target Version/s: 1.3.2
   Fix Version/s: 1.4.0

 OutputCommitCoordinator.stop() should only be executed in driver
 

 Key: SPARK-7563
 URL: https://issues.apache.org/jira/browse/SPARK-7563
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
 Environment: Red Hat Enterprise Linux Server release 7.0 (Maipo)
 Spark 1.3.1 Release
Reporter: Hailong Wen
Priority: Critical
  Labels: backport-needed
 Fix For: 1.4.0


 I am from IBM Platform Symphony team and we are integrating Spark 1.3.1 with 
 EGO (a resource management product).
 In EGO we uses fine-grained dynamic allocation policy, and each Executor will 
 exit after its tasks are all done. When testing *spark-shell*, we find that 
 when executor of first job exit, it will stop OutputCommitCoordinator, which 
 result in all future jobs failing. Details are as follows:
 We got the following error in executor when submitting job in *spark-shell* 
 the second time (the first job submission is successful):
 {noformat}
 15/05/11 04:02:31 INFO spark.util.AkkaUtils: Connecting to 
 OutputCommitCoordinator: 
 akka.tcp://sparkDriver@whlspark01:50452/user/OutputCommitCoordinator
 Exception in thread main akka.actor.ActorNotFound: Actor not found for: 
 ActorSelection[Anchor(akka.tcp://sparkDriver@whlspark01:50452/), 
 Path(/user/OutputCommitCoordinator)]
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
 at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
 at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
 at 
 akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
 at 
 scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
 at 
 scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
 at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
 at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89)
 at 
 akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
 at akka.dispatch.Mailbox.run(Mailbox.scala:220)
 at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 {noformat}
 And in driver side, we see a log message telling that the 
 OutputCommitCoordinator is stopped after the first submission:
 {noformat}
 15/05/11 04:01:23 INFO 
 spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: 
 OutputCommitCoordinator stopped!
 {noformat}
 We examine the code of OutputCommitCoordinator, and find that executor will 
 reuse the ref of driver's OutputCommitCoordinatorActor. So when an executor 
 exits, it will eventually call SparkEnv.stop():
 {noformat}
   private[spark] def stop() {
 isStopped = true
 pythonWorkers.foreach { case(key, worker) = worker.stop() }
 Option(httpFileServer).foreach(_.stop())
 mapOutputTracker.stop()
 shuffleManager.stop()
 broadcastManager.stop()
 blockManager.stop()
 blockManager.master.stop()
 metricsSystem.stop()
 

[jira] [Updated] (SPARK-7563) OutputCommitCoordinator.stop() should only be executed in driver

2015-05-22 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-7563:
--
Labels: backport-needed  (was: )

 OutputCommitCoordinator.stop() should only be executed in driver
 

 Key: SPARK-7563
 URL: https://issues.apache.org/jira/browse/SPARK-7563
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
 Environment: Red Hat Enterprise Linux Server release 7.0 (Maipo)
 Spark 1.3.1 Release
Reporter: Hailong Wen
Priority: Critical
  Labels: backport-needed

 I am from IBM Platform Symphony team and we are integrating Spark 1.3.1 with 
 EGO (a resource management product).
 In EGO we uses fine-grained dynamic allocation policy, and each Executor will 
 exit after its tasks are all done. When testing *spark-shell*, we find that 
 when executor of first job exit, it will stop OutputCommitCoordinator, which 
 result in all future jobs failing. Details are as follows:
 We got the following error in executor when submitting job in *spark-shell* 
 the second time (the first job submission is successful):
 {noformat}
 15/05/11 04:02:31 INFO spark.util.AkkaUtils: Connecting to 
 OutputCommitCoordinator: 
 akka.tcp://sparkDriver@whlspark01:50452/user/OutputCommitCoordinator
 Exception in thread main akka.actor.ActorNotFound: Actor not found for: 
 ActorSelection[Anchor(akka.tcp://sparkDriver@whlspark01:50452/), 
 Path(/user/OutputCommitCoordinator)]
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
 at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
 at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
 at 
 akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
 at 
 scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
 at 
 scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
 at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
 at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89)
 at 
 akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
 at akka.dispatch.Mailbox.run(Mailbox.scala:220)
 at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 {noformat}
 And in driver side, we see a log message telling that the 
 OutputCommitCoordinator is stopped after the first submission:
 {noformat}
 15/05/11 04:01:23 INFO 
 spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: 
 OutputCommitCoordinator stopped!
 {noformat}
 We examine the code of OutputCommitCoordinator, and find that executor will 
 reuse the ref of driver's OutputCommitCoordinatorActor. So when an executor 
 exits, it will eventually call SparkEnv.stop():
 {noformat}
   private[spark] def stop() {
 isStopped = true
 pythonWorkers.foreach { case(key, worker) = worker.stop() }
 Option(httpFileServer).foreach(_.stop())
 mapOutputTracker.stop()
 shuffleManager.stop()
 broadcastManager.stop()
 blockManager.stop()
 blockManager.master.stop()
 metricsSystem.stop()
 outputCommitCoordinator.stop()  --- 
 

[jira] [Updated] (SPARK-7563) OutputCommitCoordinator.stop() should only be executed in driver

2015-05-16 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-7563:
-
Fix Version/s: (was: 1.4.0)

 OutputCommitCoordinator.stop() should only be executed in driver
 

 Key: SPARK-7563
 URL: https://issues.apache.org/jira/browse/SPARK-7563
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
 Environment: Red Hat Enterprise Linux Server release 7.0 (Maipo)
 Spark 1.3.1 Release
Reporter: Hailong Wen
Priority: Critical

 I am from IBM Platform Symphony team and we are integrating Spark 1.3.1 with 
 EGO (a resource management product).
 In EGO we uses fine-grained dynamic allocation policy, and each Executor will 
 exit after its tasks are all done. When testing *spark-shell*, we find that 
 when executor of first job exit, it will stop OutputCommitCoordinator, which 
 result in all future jobs failing. Details are as follows:
 We got the following error in executor when submitting job in *spark-shell* 
 the second time (the first job submission is successful):
 {noformat}
 15/05/11 04:02:31 INFO spark.util.AkkaUtils: Connecting to 
 OutputCommitCoordinator: 
 akka.tcp://sparkDriver@whlspark01:50452/user/OutputCommitCoordinator
 Exception in thread main akka.actor.ActorNotFound: Actor not found for: 
 ActorSelection[Anchor(akka.tcp://sparkDriver@whlspark01:50452/), 
 Path(/user/OutputCommitCoordinator)]
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
 at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
 at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
 at 
 akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
 at 
 scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
 at 
 scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
 at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
 at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89)
 at 
 akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
 at akka.dispatch.Mailbox.run(Mailbox.scala:220)
 at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 {noformat}
 And in driver side, we see a log message telling that the 
 OutputCommitCoordinator is stopped after the first submission:
 {noformat}
 15/05/11 04:01:23 INFO 
 spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: 
 OutputCommitCoordinator stopped!
 {noformat}
 We examine the code of OutputCommitCoordinator, and find that executor will 
 reuse the ref of driver's OutputCommitCoordinatorActor. So when an executor 
 exits, it will eventually call SparkEnv.stop():
 {noformat}
   private[spark] def stop() {
 isStopped = true
 pythonWorkers.foreach { case(key, worker) = worker.stop() }
 Option(httpFileServer).foreach(_.stop())
 mapOutputTracker.stop()
 shuffleManager.stop()
 broadcastManager.stop()
 blockManager.stop()
 blockManager.master.stop()
 metricsSystem.stop()
 outputCommitCoordinator.stop()  --- 
 actorSystem.shutdown()
 ..
 {noformat} 
 

[jira] [Updated] (SPARK-7563) OutputCommitCoordinator.stop() should only be executed in driver

2015-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7563:
---
Fix Version/s: 1.4.0

 OutputCommitCoordinator.stop() should only be executed in driver
 

 Key: SPARK-7563
 URL: https://issues.apache.org/jira/browse/SPARK-7563
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
 Environment: Red Hat Enterprise Linux Server release 7.0 (Maipo)
 Spark 1.3.1 Release
Reporter: Hailong Wen
Priority: Critical
 Fix For: 1.4.0


 I am from IBM Platform Symphony team and we are integrating Spark 1.3.1 with 
 EGO (a resource management product).
 In EGO we uses fine-grained dynamic allocation policy, and each Executor will 
 exit after its tasks are all done. When testing *spark-shell*, we find that 
 when executor of first job exit, it will stop OutputCommitCoordinator, which 
 result in all future jobs failing. Details are as follows:
 We got the following error in executor when submitting job in *spark-shell* 
 the second time (the first job submission is successful):
 {noformat}
 15/05/11 04:02:31 INFO spark.util.AkkaUtils: Connecting to 
 OutputCommitCoordinator: 
 akka.tcp://sparkDriver@whlspark01:50452/user/OutputCommitCoordinator
 Exception in thread main akka.actor.ActorNotFound: Actor not found for: 
 ActorSelection[Anchor(akka.tcp://sparkDriver@whlspark01:50452/), 
 Path(/user/OutputCommitCoordinator)]
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
 at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
 at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
 at 
 akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
 at 
 scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
 at 
 scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
 at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
 at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89)
 at 
 akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
 at akka.dispatch.Mailbox.run(Mailbox.scala:220)
 at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 {noformat}
 And in driver side, we see a log message telling that the 
 OutputCommitCoordinator is stopped after the first submission:
 {noformat}
 15/05/11 04:01:23 INFO 
 spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: 
 OutputCommitCoordinator stopped!
 {noformat}
 We examine the code of OutputCommitCoordinator, and find that executor will 
 reuse the ref of driver's OutputCommitCoordinatorActor. So when an executor 
 exits, it will eventually call SparkEnv.stop():
 {noformat}
   private[spark] def stop() {
 isStopped = true
 pythonWorkers.foreach { case(key, worker) = worker.stop() }
 Option(httpFileServer).foreach(_.stop())
 mapOutputTracker.stop()
 shuffleManager.stop()
 broadcastManager.stop()
 blockManager.stop()
 blockManager.master.stop()
 metricsSystem.stop()
 outputCommitCoordinator.stop()  --- 
 

[jira] [Updated] (SPARK-7563) OutputCommitCoordinator.stop() should only be executed in driver

2015-05-15 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-7563:
---
Target Version/s: 1.3.2, 1.4.0

 OutputCommitCoordinator.stop() should only be executed in driver
 

 Key: SPARK-7563
 URL: https://issues.apache.org/jira/browse/SPARK-7563
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
 Environment: Red Hat Enterprise Linux Server release 7.0 (Maipo)
 Spark 1.3.1 Release
Reporter: Hailong Wen
Priority: Critical
 Fix For: 1.4.0


 I am from IBM Platform Symphony team and we are integrating Spark 1.3.1 with 
 EGO (a resource management product).
 In EGO we uses fine-grained dynamic allocation policy, and each Executor will 
 exit after its tasks are all done. When testing *spark-shell*, we find that 
 when executor of first job exit, it will stop OutputCommitCoordinator, which 
 result in all future jobs failing. Details are as follows:
 We got the following error in executor when submitting job in *spark-shell* 
 the second time (the first job submission is successful):
 {noformat}
 15/05/11 04:02:31 INFO spark.util.AkkaUtils: Connecting to 
 OutputCommitCoordinator: 
 akka.tcp://sparkDriver@whlspark01:50452/user/OutputCommitCoordinator
 Exception in thread main akka.actor.ActorNotFound: Actor not found for: 
 ActorSelection[Anchor(akka.tcp://sparkDriver@whlspark01:50452/), 
 Path(/user/OutputCommitCoordinator)]
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
 at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
 at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
 at 
 akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
 at 
 scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
 at 
 scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
 at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
 at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89)
 at 
 akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
 at akka.dispatch.Mailbox.run(Mailbox.scala:220)
 at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 {noformat}
 And in driver side, we see a log message telling that the 
 OutputCommitCoordinator is stopped after the first submission:
 {noformat}
 15/05/11 04:01:23 INFO 
 spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: 
 OutputCommitCoordinator stopped!
 {noformat}
 We examine the code of OutputCommitCoordinator, and find that executor will 
 reuse the ref of driver's OutputCommitCoordinatorActor. So when an executor 
 exits, it will eventually call SparkEnv.stop():
 {noformat}
   private[spark] def stop() {
 isStopped = true
 pythonWorkers.foreach { case(key, worker) = worker.stop() }
 Option(httpFileServer).foreach(_.stop())
 mapOutputTracker.stop()
 shuffleManager.stop()
 broadcastManager.stop()
 blockManager.stop()
 blockManager.master.stop()
 metricsSystem.stop()
 outputCommitCoordinator.stop()  --- 
 

[jira] [Updated] (SPARK-7563) OutputCommitCoordinator.stop() should only be executed in driver

2015-05-15 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-7563:
-
Priority: Critical  (was: Major)

 OutputCommitCoordinator.stop() should only be executed in driver
 

 Key: SPARK-7563
 URL: https://issues.apache.org/jira/browse/SPARK-7563
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
 Environment: Red Hat Enterprise Linux Server release 7.0 (Maipo)
 Spark 1.3.1 Release
Reporter: Hailong Wen
Priority: Critical

 I am from IBM Platform Symphony team and we are integrating Spark 1.3.1 with 
 EGO (a resource management product).
 In EGO we uses fine-grained dynamic allocation policy, and each Executor will 
 exit after its tasks are all done. When testing *spark-shell*, we find that 
 when executor of first job exit, it will stop OutputCommitCoordinator, which 
 result in all future jobs failing. Details are as follows:
 We got the following error in executor when submitting job in *spark-shell* 
 the second time (the first job submission is successful):
 {noformat}
 15/05/11 04:02:31 INFO spark.util.AkkaUtils: Connecting to 
 OutputCommitCoordinator: 
 akka.tcp://sparkDriver@whlspark01:50452/user/OutputCommitCoordinator
 Exception in thread main akka.actor.ActorNotFound: Actor not found for: 
 ActorSelection[Anchor(akka.tcp://sparkDriver@whlspark01:50452/), 
 Path(/user/OutputCommitCoordinator)]
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
 at 
 akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
 at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
 at 
 scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
 at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
 at 
 akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
 at 
 akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
 at 
 scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
 at 
 scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
 at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
 at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:89)
 at 
 akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:937)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
 at akka.dispatch.Mailbox.run(Mailbox.scala:220)
 at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 {noformat}
 And in driver side, we see a log message telling that the 
 OutputCommitCoordinator is stopped after the first submission:
 {noformat}
 15/05/11 04:01:23 INFO 
 spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorActor: 
 OutputCommitCoordinator stopped!
 {noformat}
 We examine the code of OutputCommitCoordinator, and find that executor will 
 reuse the ref of driver's OutputCommitCoordinatorActor. So when an executor 
 exits, it will eventually call SparkEnv.stop():
 {noformat}
   private[spark] def stop() {
 isStopped = true
 pythonWorkers.foreach { case(key, worker) = worker.stop() }
 Option(httpFileServer).foreach(_.stop())
 mapOutputTracker.stop()
 shuffleManager.stop()
 broadcastManager.stop()
 blockManager.stop()
 blockManager.master.stop()
 metricsSystem.stop()
 outputCommitCoordinator.stop()  --- 
 actorSystem.shutdown()
 ..
 {noformat}