Assuming you configured spark to use zookeeper for ha, when the master fails 
over to another node, the workers will automatically attach themselves to the 
newly elected master and this works fine. My issue is that once I go over to 
the new master web GUI ( I see all the workers attached just fine, this means 
the failover worked just fine) My issue is that the web GUI now think the spark 
streaming app running on the cluster is in "waiting" state and this is not the 
case because the app is actually running and processing events.


[http://ttllxapp-spk03.lab.tsx.com:18080/static/spark-logo-77x50px-hd.png] 
1.6.0 <http://ttllxapp-spk03.lab.tsx.com:18080/> Spark Master at 
spark://10.142.191.154:7077

  *   URL: spark://10.142.191.154:7077
  *   REST URL: spark://10.142.191.154:6066 (cluster mode)
  *   Alive Workers: 3
  *   Cores in use: 12 Total, 12 Used
  *   Memory in use: 8.2 GB Total, 4.0 GB Used
  *   Applications: 1 Running, 0 Completed
  *   Drivers: 1 Running, 0 Completed
  *   Status: ALIVE

Workers
Worker Id       Address State   Cores   Memory
worker-20160907122724-10.142.191.154-7078<http://10.142.191.154:18081/> 
10.142.191.154:7078     ALIVE   4 (4 Used)      2.7 GB (2.0 GB Used)
worker-20160907122724-10.142.191.159-7078<http://10.142.191.159:18081/> 
10.142.191.159:7078     ALIVE   4 (4 Used)      2.7 GB (1024.0 MB Used)
worker-20160907122724-10.142.191.162-7078<http://10.142.191.162:18081/> 
10.142.191.162:7078     ALIVE   4 (4 Used)      2.7 GB (1024.0 MB Used)
Running Applications
Application ID  Name    Cores   Memory per Node Submitted Time  User    State   
Duration
app-20160907122851-0000<http://ttllxapp-spk03.lab.tsx.com:18080/app?appId=app-20160907122851-0000>
(kill)<http://ttllxapp-spk03.lab.tsx.com:18080/#>
        Ex1Feed<http://10.142.191.154:4040/>    12      1024.0 MB       
2016/09/07 12:28:51     spark   WAITING 10 min




________________________________
From: Mich Talebzadeh <mich.talebza...@gmail.com>
Sent: September 7, 2016 3:42 PM
To: arlindo santos
Cc: user @spark
Subject: Re: spark 1.6.0 web console shows running application in a "waiting" 
status, but it's acutally running

I just tested it.

If you start master on the original host, the workers on that host they won't 
respond. They will stay stale. So there is no heartbeat between workers and 
master except the initial handshake

The only way is to stop workers (if they are still running), restart the master 
and restart workers.


HTH



Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.



On 7 September 2016 at 16:15, Mich Talebzadeh 
<mich.talebza...@gmail.com<mailto:mich.talebza...@gmail.com>> wrote:
Ok but look at the worker why is it still saying port 7077. That port on that 
host as far as I know is the port that local master is running which is no 
longer there?


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.



On 7 September 2016 at 16:05, arlindo santos 
<sarli...@hotmail.com<mailto:sarli...@hotmail.com>> wrote:

Port 7077 is for "client" mode connections to the master. In "cluster" mode 
it's 6066 and this means the "driver" runs on the spark cluster on a node spark 
chooses. The command I use to deploy my spark app (including the driver) is 
below:


spark-submit --deploy-mode cluster --master 
spark://tiplxapp-spk01:6066,tiplxapp-spk02:6066,tiplxapp-spk03:6066 
/app/tmx/ngxspark/lib/EX1AppSpark-1.0.13.jar



Yes, your right I believe when the master dies, zookeeper detects that and 
elects a new master node and spark-submit should carry on. Not sure how this 
leads into the UI believing the app is in "waiting" state?


Also, I noticed when these fail overs happen the "worker" web GUI goes a bit 
strange and starts reporting over allocated resources? Look at the cores and 
memory used?


[http://142.201.185.134:18081/static/spark-logo-77x50px-hd.png] 1.6.0 
<http://142.201.185.134:18081/> Spark Worker at 
142.201.185.134:7078<http://142.201.185.134:7078>

  *   ID: worker-20160622152457-142.201.185.134-7078
  *   Master URL: spark://142.201.185.132:7077<http://142.201.185.132:7077>
  *   Cores: 4 (5 Used)
  *   Memory: 2.7 GB (3.0 GB Used)

Back to Master<http://142.201.185.132:18080/>





________________________________
From: Mich Talebzadeh 
<mich.talebza...@gmail.com<mailto:mich.talebza...@gmail.com>>
Sent: September 7, 2016 2:52 PM
To: arlindo santos

Cc: user @spark
Subject: Re: spark 1.6.0 web console shows running application in a "waiting" 
status, but it's acutally running

This is my take.

When you issue spark-submit on any node it start GUI on port 4040 by default. 
Otherwise you can specify port yourself with --conf  "spark.ui.port=<port>"

As I understand in standalone mode executors run on workers.

$SPARK_HOME/sbin/start-slave.sh spark://<host>::7077

That port 7077 is the master port. If master dies, then those workers lose 
connection to port 7077 so I believe they go stale. So the spark-submit carries 
on using the remaining executors on other workers.

So in summary one expects the job to run.  You start your UI on <HOST>:port.

One test you can do is to exit from UI and start UI on the host that zookeeper 
selects the master on the same port. That should work.

HTH







Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.



On 7 September 2016 at 15:27, arlindo santos 
<sarli...@hotmail.com<mailto:sarli...@hotmail.com>> wrote:
Yes refreshed a few times. Running in cluster mode.

Fyi.. I can duplicate this easily now. Our setup consists of 3 nodes running 
standalone spark, master and worker on each, zookeeper doing master leader 
election. If I kill a master on any node, the master shifts to another node and 
that is when the app state changes to waiting and never changes back to running 
on the gui, but really it's in a running mode.

Sent from my BlackBerry 10 smartphone on the Rogers network.
From: Mich Talebzadeh
Sent: Wednesday, September 7, 2016 9:50 AM
To: sarlindo
Cc: user @spark
Subject: Re: spark 1.6.0 web console shows running application in a "waiting" 
status, but it's acutally running


Have you refreshed the Spark UI page?

What Mode are you running your Spark app?

HTH


Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.



On 6 September 2016 at 16:15, sarlindo 
<sarli...@hotmail.com<mailto:sarli...@hotmail.com>> wrote:
I have 2 questions/issues.

1. We had the spark-master shut down (reason unknown) we looked at the
spark-master logs and it simply shows this, is there some other log I should
be looking at to find out why the master went down?

16/09/05 21:10:00 INFO ClientCnxn: Opening socket connection to server
tiplxapp-spk02.prd.tse.com/142.201.219.76:2181<http://tiplxapp-spk02.prd.tse.com/142.201.219.76:2181>.
 Will not attempt to
authenticate using SASL (unknown error)
16/09/05 21:10:00 ERROR Master: Leadership has been revoked -- master
shutting down.
16/09/05 21:10:00 INFO ClientCnxn: Socket connection established, initiating
session, client: /142.201.219.75:56361<http://142.201.219.75:56361>, server:
tiplxapp-spk02.prd.tse.com/142.201.219.76:2181<http://tiplxapp-spk02.prd.tse.com/142.201.219.76:2181>


2. Spark 1.6.0 web console shows a running application in a "waiting"
status, but it's actually running. Is this an existing bug?

<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27665/33.png>





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-6-0-web-console-shows-running-application-in-a-waiting-status-but-it-s-acutally-running-tp27665.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>





Reply via email to