Port 7077 is for "client" mode connections to the master. In "cluster" mode it's 6066 and this means the "driver" runs on the spark cluster on a node spark chooses. The command I use to deploy my spark app (including the driver) is below:
spark-submit --deploy-mode cluster --master spark://tiplxapp-spk01:6066,tiplxapp-spk02:6066,tiplxapp-spk03:6066 /app/tmx/ngxspark/lib/EX1AppSpark-1.0.13.jar Yes, your right I believe when the master dies, zookeeper detects that and elects a new master node and spark-submit should carry on. Not sure how this leads into the UI believing the app is in "waiting" state? Also, I noticed when these fail overs happen the "worker" web GUI goes a bit strange and starts reporting over allocated resources? Look at the cores and memory used? [http://142.201.185.134:18081/static/spark-logo-77x50px-hd.png] 1.6.0 <http://142.201.185.134:18081/> Spark Worker at 142.201.185.134:7078 * ID: worker-20160622152457-142.201.185.134-7078 * Master URL: spark://142.201.185.132:7077 * Cores: 4 (5 Used) * Memory: 2.7 GB (3.0 GB Used) Back to Master<http://142.201.185.132:18080/> ________________________________ From: Mich Talebzadeh <mich.talebza...@gmail.com> Sent: September 7, 2016 2:52 PM To: arlindo santos Cc: user @spark Subject: Re: spark 1.6.0 web console shows running application in a "waiting" status, but it's acutally running This is my take. When you issue spark-submit on any node it start GUI on port 4040 by default. Otherwise you can specify port yourself with --conf "spark.ui.port=<port>" As I understand in standalone mode executors run on workers. $SPARK_HOME/sbin/start-slave.sh spark://<host>::7077 That port 7077 is the master port. If master dies, then those workers lose connection to port 7077 so I believe they go stale. So the spark-submit carries on using the remaining executors on other workers. So in summary one expects the job to run. You start your UI on <HOST>:port. One test you can do is to exit from UI and start UI on the host that zookeeper selects the master on the same port. That should work. HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 7 September 2016 at 15:27, arlindo santos <sarli...@hotmail.com<mailto:sarli...@hotmail.com>> wrote: Yes refreshed a few times. Running in cluster mode. Fyi.. I can duplicate this easily now. Our setup consists of 3 nodes running standalone spark, master and worker on each, zookeeper doing master leader election. If I kill a master on any node, the master shifts to another node and that is when the app state changes to waiting and never changes back to running on the gui, but really it's in a running mode. Sent from my BlackBerry 10 smartphone on the Rogers network. From: Mich Talebzadeh Sent: Wednesday, September 7, 2016 9:50 AM To: sarlindo Cc: user @spark Subject: Re: spark 1.6.0 web console shows running application in a "waiting" status, but it's acutally running Have you refreshed the Spark UI page? What Mode are you running your Spark app? HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 6 September 2016 at 16:15, sarlindo <sarli...@hotmail.com<mailto:sarli...@hotmail.com>> wrote: I have 2 questions/issues. 1. We had the spark-master shut down (reason unknown) we looked at the spark-master logs and it simply shows this, is there some other log I should be looking at to find out why the master went down? 16/09/05 21:10:00 INFO ClientCnxn: Opening socket connection to server tiplxapp-spk02.prd.tse.com/142.201.219.76:2181<http://tiplxapp-spk02.prd.tse.com/142.201.219.76:2181>. Will not attempt to authenticate using SASL (unknown error) 16/09/05 21:10:00 ERROR Master: Leadership has been revoked -- master shutting down. 16/09/05 21:10:00 INFO ClientCnxn: Socket connection established, initiating session, client: /142.201.219.75:56361<http://142.201.219.75:56361>, server: tiplxapp-spk02.prd.tse.com/142.201.219.76:2181<http://tiplxapp-spk02.prd.tse.com/142.201.219.76:2181> 2. Spark 1.6.0 web console shows a running application in a "waiting" status, but it's actually running. Is this an existing bug? <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27665/33.png> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-6-0-web-console-shows-running-application-in-a-waiting-status-but-it-s-acutally-running-tp27665.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>