Hi Tim, I tested the scenario again with settings as below,
[dcos@agent spark-2.0.2-bin-hadoop2.7]$ cat conf/spark-defaults.conf spark.deploy.recoveryMode ZOOKEEPER spark.deploy.zookeeper.url 192.168.111.53:2181 spark.deploy.zookeeper.dir /spark spark.executor.memory 512M spark.mesos.principal agent-dev-1 However, the case still failed. After master restarted, spark framework did not re-register. From spark framework log, it seemed that below method in MesosClusterScheduler was not called. override def reregistered(driver: SchedulerDriver, masterInfo: MasterInfo): Unit Did I miss something? Any advice? Thanks, Jared, (韦煜) Software developer Interested in open source software, big data, Linux ________________________________ From: Timothy Chen <tnac...@gmail.com> Sent: Friday, March 31, 2017 5:13 AM To: Yu Wei Cc: us...@spark.apache.org; dev Subject: Re: [Spark on mesos] Spark framework not re-registered and lost after mesos master restarted I think failover isn't enabled on regular Spark job framework, since we assume jobs are more ephemeral. It could be a good setting to add to the Spark framework to enable failover. Tim On Mar 30, 2017, at 10:18 AM, Yu Wei <yu20...@hotmail.com<mailto:yu20...@hotmail.com>> wrote: Hi guys, I encountered a problem about spark on mesos. I setup mesos cluster and launched spark framework on mesos successfully. Then mesos master was killed and started again. However, spark framework couldn't be re-registered again as mesos agent does. I also couldn't find any error logs. And MesosClusterDispatcher is still running there. I suspect this is spark framework issue. What's your opinion? Thanks, Jared, (韦煜) Software developer Interested in open source software, big data, Linux