[ https://issues.apache.org/jira/browse/SPARK-643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matei Zaharia resolved SPARK-643. --------------------------------- Resolution: Fixed > Standalone master crashes during actor restart > ---------------------------------------------- > > Key: SPARK-643 > URL: https://issues.apache.org/jira/browse/SPARK-643 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 0.6.1 > Reporter: Josh Rosen > Assignee: Josh Rosen > > The standalone master will crash if it restarts due to an exception: > {code} > 12/12/15 03:10:47 ERROR master.Master: Job SkewBenchmark wth ID > job-20121215031047-0000 failed 11 times. > spark.SparkException: Job SkewBenchmark wth ID job-20121215031047-0000 failed > 11 times. > at > spark.deploy.master.Master$$anonfun$receive$1.apply(Master.scala:103) > at > spark.deploy.master.Master$$anonfun$receive$1.apply(Master.scala:62) > at akka.actor.Actor$class.apply(Actor.scala:318) > at spark.deploy.master.Master.apply(Master.scala:17) > at akka.actor.ActorCell.invoke(ActorCell.scala:626) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:197) > at akka.dispatch.Mailbox.run(Mailbox.scala:179) > at > akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516) > at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259) > at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975) > at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479) > at > akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) > 12/12/15 03:10:47 INFO master.Master: Starting Spark master at > spark://ip-10-226-87-193:7077 > 12/12/15 03:10:47 INFO io.IoWorker: IoWorker thread 'spray-io-worker-1' > started > 12/12/15 03:10:47 ERROR master.Master: Failed to create web UI > akka.actor.InvalidActorNameException:actor name HttpServer is not unique! > [05aed000-4665-11e2-b361-12313d316833] > at akka.actor.ActorCell.actorOf(ActorCell.scala:392) > at > akka.actor.LocalActorRefProvider$Guardian$$anonfun$receive$1.liftedTree1$1(ActorRefProvider.scala:394) > at > akka.actor.LocalActorRefProvider$Guardian$$anonfun$receive$1.apply(ActorRefProvider.scala:394) > at > akka.actor.LocalActorRefProvider$Guardian$$anonfun$receive$1.apply(ActorRefProvider.scala:392) > at akka.actor.Actor$class.apply(Actor.scala:318) > at > akka.actor.LocalActorRefProvider$Guardian.apply(ActorRefProvider.scala:388) > at akka.actor.ActorCell.invoke(ActorCell.scala:626) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:197) > at akka.dispatch.Mailbox.run(Mailbox.scala:179) > at > akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516) > at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259) > at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975) > at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479) > at > akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) > {code} > When the Master actor restarts, Akka calls the {{postRestart}} hook. [By > default|http://doc.akka.io/docs/akka/snapshot/general/supervision.html#supervision-restart], > this calls {{preStart}}. The standalone master's {{preStart}} method tries > to start the webUI but crashes because it is already running. > I ran into this after a job failed more than 11 times, which causes the > Master to throw a SparkException from its {{receive}} method. > The solution is to implement a custom {{postRestart}} hook. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org