Hi Yu, As mentioned earlier, currently the Spark framework will not re-register as the failover_timeout is not set and there is no configuration available yet. It's only enabled in MesosClusterScheduler since it's meant to be a HA framework.
We should add that configuration for users that want their Spark frameworks to be able to failover in case of Master failover or network disconnect, etc. Tim On Thu, Mar 30, 2017 at 8:25 PM, Yu Wei <yu20...@hotmail.com> wrote: > Hi Tim, > > I tested the scenario again with settings as below, > > [dcos@agent spark-2.0.2-bin-hadoop2.7]$ cat conf/spark-defaults.conf > spark.deploy.recoveryMode ZOOKEEPER > spark.deploy.zookeeper.url 192.168.111.53:2181 > spark.deploy.zookeeper.dir /spark > spark.executor.memory 512M > spark.mesos.principal agent-dev-1 > > > However, the case still failed. After master restarted, spark framework did > not re-register. > From spark framework log, it seemed that below method in > MesosClusterScheduler was not called. > override def reregistered(driver: SchedulerDriver, masterInfo: MasterInfo): > Unit > > Did I miss something? Any advice? > > > Thanks, > > Jared, (韦煜) > Software developer > Interested in open source software, big data, Linux > > > > ________________________________ > From: Timothy Chen <tnac...@gmail.com> > Sent: Friday, March 31, 2017 5:13 AM > To: Yu Wei > Cc: us...@spark.apache.org; dev > Subject: Re: [Spark on mesos] Spark framework not re-registered and lost > after mesos master restarted > > I think failover isn't enabled on regular Spark job framework, since we > assume jobs are more ephemeral. > > It could be a good setting to add to the Spark framework to enable failover. > > Tim > > On Mar 30, 2017, at 10:18 AM, Yu Wei <yu20...@hotmail.com> wrote: > > Hi guys, > > I encountered a problem about spark on mesos. > > I setup mesos cluster and launched spark framework on mesos successfully. > > Then mesos master was killed and started again. > > However, spark framework couldn't be re-registered again as mesos agent > does. I also couldn't find any error logs. > > And MesosClusterDispatcher is still running there. > > > I suspect this is spark framework issue. > > What's your opinion? > > > > Thanks, > > Jared, (韦煜) > Software developer > Interested in open source software, big data, Linux --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org