I just tested it.

If you start master on the original host, the workers on that host they
won't respond. They will stay stale. So there is no heartbeat between
workers and master except the initial handshake

The only way is to stop workers (if they are still running), restart the
master and restart workers.


HTH


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 7 September 2016 at 16:15, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Ok but look at the worker why is it still saying port 7077. That port on
> that host as far as I know is the port that local master is running which
> is no longer there?
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 7 September 2016 at 16:05, arlindo santos <sarli...@hotmail.com> wrote:
>
>> Port 7077 is for "client" mode connections to the master. In "cluster"
>> mode it's 6066 and this means the "driver" runs on the spark cluster on a
>> node spark chooses. The command I use to deploy my spark app (including the
>> driver) is below:
>>
>>
>> spark-submit --deploy-mode cluster --master spark://tiplxapp-spk01:6066,ti
>> plxapp-spk02:6066,tiplxapp-spk03:6066 /app/tmx/ngxspark/lib/EX1AppSp
>> ark-1.0.13.jar
>>
>>
>>
>> Yes, your right I believe when the master dies, zookeeper detects that
>> and elects a new master node and spark-submit should carry on. Not sure how
>> this leads into the UI believing the app is in "waiting" state?
>>
>>
>> Also, I noticed when these fail overs happen the "worker" web GUI goes a
>> bit strange and starts reporting over allocated resources? Look at the
>> cores and memory used?
>>
>>
>>  1.6.0  <http://142.201.185.134:18081/>Spark Worker at
>> 142.201.185.134:7078
>>
>>    - *ID:* worker-20160622152457-142.201.185.134-7078
>>    - *Master URL:* spark://142.201.185.132:7077
>>    - *Cores:* 4 (5 Used)
>>    - *Memory:* 2.7 GB (3.0 GB Used)
>>
>> Back to Master <http://142.201.185.132:18080/>
>>
>>
>>
>>
>>
>> ------------------------------
>> *From:* Mich Talebzadeh <mich.talebza...@gmail.com>
>> *Sent:* September 7, 2016 2:52 PM
>> *To:* arlindo santos
>>
>> *Cc:* user @spark
>> *Subject:* Re: spark 1.6.0 web console shows running application in a
>> "waiting" status, but it's acutally running
>>
>> This is my take.
>>
>> When you issue spark-submit on any node it start GUI on port 4040 by
>> default. Otherwise you can specify port yourself with --conf
>> "spark.ui.port=<port>"
>>
>> As I understand in standalone mode executors run on workers.
>>
>> $SPARK_HOME/sbin/start-slave.sh spark://<host>::7077
>>
>> That port 7077 is the master port. If master dies, then those workers
>> lose connection to port 7077 so I believe they go stale. So the
>> spark-submit carries on using the remaining executors on other workers.
>>
>> So in summary one expects the job to run.  You start your UI on
>> <HOST>:port.
>>
>> One test you can do is to exit from UI and start UI on the host that
>> zookeeper selects the master on the same port. That should work.
>>
>> HTH
>>
>>
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 7 September 2016 at 15:27, arlindo santos <sarli...@hotmail.com>
>> wrote:
>>
>>> Yes refreshed a few times. Running in cluster mode.
>>>
>>> Fyi.. I can duplicate this easily now. Our setup consists of 3 nodes
>>> running standalone spark, master and worker on each, zookeeper doing master
>>> leader election. If I kill a master on any node, the master shifts to
>>> another node and that is when the app state changes to waiting and never
>>> changes back to running on the gui, but really it's in a running mode.
>>>
>>> Sent from my BlackBerry 10 smartphone on the Rogers network.
>>> *From: *Mich Talebzadeh
>>> *Sent: *Wednesday, September 7, 2016 9:50 AM
>>> *To: *sarlindo
>>> *Cc: *user @spark
>>> *Subject: *Re: spark 1.6.0 web console shows running application in a
>>> "waiting" status, but it's acutally running
>>>
>>> Have you refreshed the Spark UI page?
>>>
>>> What Mode are you running your Spark app?
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 6 September 2016 at 16:15, sarlindo <sarli...@hotmail.com> wrote:
>>>
>>>> I have 2 questions/issues.
>>>>
>>>> 1. We had the spark-master shut down (reason unknown) we looked at the
>>>> spark-master logs and it simply shows this, is there some other log I
>>>> should
>>>> be looking at to find out why the master went down?
>>>>
>>>> 16/09/05 21:10:00 INFO ClientCnxn: Opening socket connection to server
>>>> tiplxapp-spk02.prd.tse.com/142.201.219.76:2181. Will not attempt to
>>>> authenticate using SASL (unknown error)
>>>> 16/09/05 21:10:00 ERROR Master: Leadership has been revoked -- master
>>>> shutting down.
>>>> 16/09/05 21:10:00 INFO ClientCnxn: Socket connection established,
>>>> initiating
>>>> session, client: /142.201.219.75:56361, server:
>>>> tiplxapp-spk02.prd.tse.com/142.201.219.76:2181
>>>>
>>>>
>>>> 2. Spark 1.6.0 web console shows a running application in a "waiting"
>>>> status, but it's actually running. Is this an existing bug?
>>>>
>>>> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27665/33.png
>>>> >
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://apache-spark-user-list.
>>>> 1001560.n3.nabble.com/spark-1-6-0-web-console-shows-running-
>>>> application-in-a-waiting-status-but-it-s-acutally-running-tp27665.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>>
>>>>
>>>
>>
>

Reply via email to