Re: Issues in opening UI when running Spark Streaming in YARN

2014-07-07 Thread Andrew Or
I will assume that you are running in yarn-cluster mode. Because the driver
is launched in one of the containers, it doesn't make sense to expose port
4040 for the node that contains the container. (Imagine if multiple driver
containers are launched on the same node. This will cause a port
collision). If you're launching Spark from a gateway node that is
physically near your worker nodes, then you can just launch your
application in yarn-client mode, in which case the SparkUI will always be
started on port 4040 on the node that you ran spark-submit on. The reason
why sometimes you see the red text is because it appears only on the driver
containers, not the executor containers. This is because SparkUI belongs to
the SparkContext, which only exists on the driver.

Andrew


2014-07-07 11:20 GMT-07:00 Yan Fang yanfang...@gmail.com:

 Hi guys,

 Not sure if you  have similar issues. Did not find relevant tickets in
 JIRA. When I deploy the Spark Streaming to YARN, I have following two
 issues:

 1. The UI port is random. It is not default 4040. I have to look at the
 container's log to check the UI port. Is this suppose to be this way?

 2. Most of the time, the UI does not work. The difference between logs are
 (I ran the same program):






 *14/07/03 11:38:50 INFO spark.HttpServer: Starting HTTP Server14/07/03
 11:38:50 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/07/03 11:38:50 INFO
 server.AbstractConnector: Started SocketConnector@0.0.0.0:12026
 http://SocketConnector@0.0.0.0:1202614/07/03 11:38:51 INFO
 executor.CoarseGrainedExecutorBackend: Got assigned task 0 14/07/03
 11:38:51 INFO executor.Executor: Running task ID 0...*

 14/07/02 16:55:32 INFO spark.HttpServer: Starting HTTP Server
 14/07/02 16:55:32 INFO server.Server: jetty-8.y.z-SNAPSHOT
 14/07/02 16:55:32 INFO server.AbstractConnector: Started
 SocketConnector@0.0.0.0:14211




 *14/07/02 16:55:32 INFO ui.JettyUtils: Adding filter:
 org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter14/07/02 16:55:32
 INFO server.Server: jetty-8.y.z-SNAPSHOT14/07/02 16:55:32 INFO
 server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:21867
 http://SelectChannelConnector@0.0.0.0:21867 14/07/02 16:55:32 INFO
 ui.SparkUI: Started SparkUI at http://myNodeName:21867
 http://myNodeName:2186714/07/02 16:55:32 INFO
 cluster.YarnClusterScheduler: Created YarnClusterScheduler*

 When the red part comes, the UI works sometime. Any ideas? Thank you.

 Best,

 Fang, Yan
 yanfang...@gmail.com
 +1 (206) 849-4108



Re: Issues in opening UI when running Spark Streaming in YARN

2014-07-07 Thread Yan Fang
Hi Andrew,

Thanks for the quick reply. It works with the yarn-client mode.

One question about the yarn-cluster mode: actually I was checking the AM
for the log, since the spark driver is running in the AM, the UI should
also work, right? But that is not true in my case.

Best,

Fang, Yan
yanfang...@gmail.com
+1 (206) 849-4108


On Mon, Jul 7, 2014 at 11:42 AM, Andrew Or and...@databricks.com wrote:

 I will assume that you are running in yarn-cluster mode. Because the
 driver is launched in one of the containers, it doesn't make sense to
 expose port 4040 for the node that contains the container. (Imagine if
 multiple driver containers are launched on the same node. This will cause a
 port collision). If you're launching Spark from a gateway node that is
 physically near your worker nodes, then you can just launch your
 application in yarn-client mode, in which case the SparkUI will always be
 started on port 4040 on the node that you ran spark-submit on. The reason
 why sometimes you see the red text is because it appears only on the driver
 containers, not the executor containers. This is because SparkUI belongs to
 the SparkContext, which only exists on the driver.

 Andrew


 2014-07-07 11:20 GMT-07:00 Yan Fang yanfang...@gmail.com:

 Hi guys,

 Not sure if you  have similar issues. Did not find relevant tickets in
 JIRA. When I deploy the Spark Streaming to YARN, I have following two
 issues:

 1. The UI port is random. It is not default 4040. I have to look at the
 container's log to check the UI port. Is this suppose to be this way?

 2. Most of the time, the UI does not work. The difference between logs
 are (I ran the same program):






 *14/07/03 11:38:50 INFO spark.HttpServer: Starting HTTP Server14/07/03
 11:38:50 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/07/03 11:38:50 INFO
 server.AbstractConnector: Started SocketConnector@0.0.0.0:12026
 http://SocketConnector@0.0.0.0:1202614/07/03 11:38:51 INFO
 executor.CoarseGrainedExecutorBackend: Got assigned task 0 14/07/03
 11:38:51 INFO executor.Executor: Running task ID 0...*

 14/07/02 16:55:32 INFO spark.HttpServer: Starting HTTP Server
 14/07/02 16:55:32 INFO server.Server: jetty-8.y.z-SNAPSHOT
 14/07/02 16:55:32 INFO server.AbstractConnector: Started
 SocketConnector@0.0.0.0:14211




 *14/07/02 16:55:32 INFO ui.JettyUtils: Adding filter:
 org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter14/07/02 16:55:32
 INFO server.Server: jetty-8.y.z-SNAPSHOT14/07/02 16:55:32 INFO
 server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:21867
 http://SelectChannelConnector@0.0.0.0:21867 14/07/02 16:55:32 INFO
 ui.SparkUI: Started SparkUI at http://myNodeName:21867
 http://myNodeName:2186714/07/02 16:55:32 INFO
 cluster.YarnClusterScheduler: Created YarnClusterScheduler*

 When the red part comes, the UI works sometime. Any ideas? Thank you.

 Best,

 Fang, Yan
 yanfang...@gmail.com
 +1 (206) 849-4108





Re: Issues in opening UI when running Spark Streaming in YARN

2014-07-07 Thread Andrew Or
@Yan, the UI should still work. As long as you look into the container that
launches the driver, you will find the SparkUI address and port. Note that
in yarn-cluster mode the Spark driver doesn't actually run in the
Application Manager; just like the executors, it runs in a container that
is launched by the Resource Manager after the Application Master requests
the container resources. In contrast, in yarn-client mode, your driver is
not launched in a container, but in the client process that launched your
application (i.e. spark-submit), so the stdout of this program directly
contains the SparkUI messages.

@Chester, I'm not sure what has gone wrong as there are many factors at
play here. When you go the Resource Manager UI, does the application URL
link point you to the same SparkUI address as indicated in the logs? If so,
this is the correct behavior. However, I believe the redirect error has
little to do with Spark itself, but more to do with how you set up the
cluster. I have actually run into this myself, but I haven't found a
workaround. Let me know if you find anything.




2014-07-07 12:07 GMT-07:00 Chester Chen ches...@alpinenow.com:

 As Andrew explained, the port is random rather than 4040, as the the spark
 driver is started in Application Master and the port is random selected.


 But I have the similar UI issue. I am running Yarn Cluster mode against my
 local CDH5 cluster.

 The log states
 14/07/07 11:59:29 INFO ui.SparkUI: Started SparkUI at
 http://10.0.0.63:58750

 


 but when you client the spark UI link (ApplicationMaster or

 http://10.0.0.63:58750), I will got a 404 with the redirect URI


  http://localhost/proxy/application_1404443455764_0010/



 Looking at the Spark code, notice that the proxy is reallya variable to get 
 the proxy at the yarn-site.xml http address. But when I specified the value 
 at yarn-site.xml, it still doesn't work for me.



 Oddly enough, it works for my co-worker on Pivotal HD cluster, therefore I am 
 still looking what's the difference in terms of cluster setup or something 
 else.


 Chester





 On Mon, Jul 7, 2014 at 11:42 AM, Andrew Or and...@databricks.com wrote:

 I will assume that you are running in yarn-cluster mode. Because the
 driver is launched in one of the containers, it doesn't make sense to
 expose port 4040 for the node that contains the container. (Imagine if
 multiple driver containers are launched on the same node. This will cause a
 port collision). If you're launching Spark from a gateway node that is
 physically near your worker nodes, then you can just launch your
 application in yarn-client mode, in which case the SparkUI will always be
 started on port 4040 on the node that you ran spark-submit on. The reason
 why sometimes you see the red text is because it appears only on the driver
 containers, not the executor containers. This is because SparkUI belongs to
 the SparkContext, which only exists on the driver.

 Andrew


 2014-07-07 11:20 GMT-07:00 Yan Fang yanfang...@gmail.com:

 Hi guys,

 Not sure if you  have similar issues. Did not find relevant tickets in
 JIRA. When I deploy the Spark Streaming to YARN, I have following two
 issues:

 1. The UI port is random. It is not default 4040. I have to look at the
 container's log to check the UI port. Is this suppose to be this way?

 2. Most of the time, the UI does not work. The difference between logs
 are (I ran the same program):






 *14/07/03 11:38:50 INFO spark.HttpServer: Starting HTTP Server14/07/03
 11:38:50 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/07/03 11:38:50 INFO
 server.AbstractConnector: Started SocketConnector@0.0.0.0:12026
 http://SocketConnector@0.0.0.0:1202614/07/03 11:38:51 INFO
 executor.CoarseGrainedExecutorBackend: Got assigned task 0 14/07/03
 11:38:51 INFO executor.Executor: Running task ID 0...*

 14/07/02 16:55:32 INFO spark.HttpServer: Starting HTTP Server
 14/07/02 16:55:32 INFO server.Server: jetty-8.y.z-SNAPSHOT
 14/07/02 16:55:32 INFO server.AbstractConnector: Started
 SocketConnector@0.0.0.0:14211




 *14/07/02 16:55:32 INFO ui.JettyUtils: Adding filter:
 org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter14/07/02 16:55:32
 INFO server.Server: jetty-8.y.z-SNAPSHOT14/07/02 16:55:32 INFO
 server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:21867
 http://SelectChannelConnector@0.0.0.0:21867 14/07/02 16:55:32 INFO
 ui.SparkUI: Started SparkUI at http://myNodeName:21867
 http://myNodeName:2186714/07/02 16:55:32 INFO
 cluster.YarnClusterScheduler: Created YarnClusterScheduler*

 When the red part comes, the UI works sometime. Any ideas? Thank you.

 Best,

 Fang, Yan
 yanfang...@gmail.com
 +1 (206) 849-4108






Re: Issues in opening UI when running Spark Streaming in YARN

2014-07-07 Thread Yan Fang
Thank you, Andrew. That makes sense for me now. I was confused by In
yarn-cluster mode, the Spark driver runs inside an application master
process which is managed by YARN on the cluster in
http://spark.apache.org/docs/latest/running-on-yarn.html . After
you explanation, it's clear now. Thank you.

Best,

Fang, Yan
yanfang...@gmail.com
+1 (206) 849-4108


On Mon, Jul 7, 2014 at 1:07 PM, Andrew Or and...@databricks.com wrote:

 @Yan, the UI should still work. As long as you look into the container
 that launches the driver, you will find the SparkUI address and port. Note
 that in yarn-cluster mode the Spark driver doesn't actually run in the
 Application Manager; just like the executors, it runs in a container that
 is launched by the Resource Manager after the Application Master requests
 the container resources. In contrast, in yarn-client mode, your driver is
 not launched in a container, but in the client process that launched your
 application (i.e. spark-submit), so the stdout of this program directly
 contains the SparkUI messages.

 @Chester, I'm not sure what has gone wrong as there are many factors at
 play here. When you go the Resource Manager UI, does the application URL
 link point you to the same SparkUI address as indicated in the logs? If so,
 this is the correct behavior. However, I believe the redirect error has
 little to do with Spark itself, but more to do with how you set up the
 cluster. I have actually run into this myself, but I haven't found a
 workaround. Let me know if you find anything.




 2014-07-07 12:07 GMT-07:00 Chester Chen ches...@alpinenow.com:

 As Andrew explained, the port is random rather than 4040, as the the spark
 driver is started in Application Master and the port is random selected.


 But I have the similar UI issue. I am running Yarn Cluster mode against
 my local CDH5 cluster.

 The log states
 14/07/07 11:59:29 INFO ui.SparkUI: Started SparkUI at
 http://10.0.0.63:58750


 


 but when you client the spark UI link (ApplicationMaster or

 http://10.0.0.63:58750), I will got a 404 with the redirect URI




  http://localhost/proxy/application_1404443455764_0010/



 Looking at the Spark code, notice that the proxy is reallya variable to 
 get the proxy at the yarn-site.xml http address. But when I specified the 
 value at yarn-site.xml, it still doesn't work for me.



 Oddly enough, it works for my co-worker on Pivotal HD cluster, therefore I 
 am still looking what's the difference in terms of cluster setup or 
 something else.


 Chester





 On Mon, Jul 7, 2014 at 11:42 AM, Andrew Or and...@databricks.com wrote:

 I will assume that you are running in yarn-cluster mode. Because the
 driver is launched in one of the containers, it doesn't make sense to
 expose port 4040 for the node that contains the container. (Imagine if
 multiple driver containers are launched on the same node. This will cause a
 port collision). If you're launching Spark from a gateway node that is
 physically near your worker nodes, then you can just launch your
 application in yarn-client mode, in which case the SparkUI will always be
 started on port 4040 on the node that you ran spark-submit on. The reason
 why sometimes you see the red text is because it appears only on the driver
 containers, not the executor containers. This is because SparkUI belongs to
 the SparkContext, which only exists on the driver.

 Andrew


 2014-07-07 11:20 GMT-07:00 Yan Fang yanfang...@gmail.com:

 Hi guys,

 Not sure if you  have similar issues. Did not find relevant tickets in
 JIRA. When I deploy the Spark Streaming to YARN, I have following two
 issues:

 1. The UI port is random. It is not default 4040. I have to look at the
 container's log to check the UI port. Is this suppose to be this way?

 2. Most of the time, the UI does not work. The difference between logs
 are (I ran the same program):






 *14/07/03 11:38:50 INFO spark.HttpServer: Starting HTTP Server14/07/03
 11:38:50 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/07/03 11:38:50 INFO
 server.AbstractConnector: Started SocketConnector@0.0.0.0:12026
 http://SocketConnector@0.0.0.0:1202614/07/03 11:38:51 INFO
 executor.CoarseGrainedExecutorBackend: Got assigned task 0 14/07/03
 11:38:51 INFO executor.Executor: Running task ID 0...*

 14/07/02 16:55:32 INFO spark.HttpServer: Starting HTTP Server
 14/07/02 16:55:32 INFO server.Server: jetty-8.y.z-SNAPSHOT
 14/07/02 16:55:32 INFO server.AbstractConnector: Started
 SocketConnector@0.0.0.0:14211




 *14/07/02 16:55:32 INFO ui.JettyUtils: Adding filter:
 org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter14/07/02 16:55:32
 INFO server.Server: jetty-8.y.z-SNAPSHOT14/07/02 16:55:32 INFO
 server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:21867
 http://SelectChannelConnector@0.0.0.0:21867 14/07/02 16:55:32 INFO
 ui.SparkUI: Started SparkUI at http://myNodeName:21867
 http://myNodeName:2186714/07/02 16:55:32 INFO
 cluster.YarnClusterScheduler: Created 

Re: Issues in opening UI when running Spark Streaming in YARN

2014-07-07 Thread Chester Chen
@Andrew

  Yes, the link point to the same redirected


 http://localhost/proxy/application_1404443455764_0010/


  I suspect something todo with the cluster setup. I will let you know
once I found something.

Chester


On Mon, Jul 7, 2014 at 1:07 PM, Andrew Or and...@databricks.com wrote:

 @Yan, the UI should still work. As long as you look into the container
 that launches the driver, you will find the SparkUI address and port. Note
 that in yarn-cluster mode the Spark driver doesn't actually run in the
 Application Manager; just like the executors, it runs in a container that
 is launched by the Resource Manager after the Application Master requests
 the container resources. In contrast, in yarn-client mode, your driver is
 not launched in a container, but in the client process that launched your
 application (i.e. spark-submit), so the stdout of this program directly
 contains the SparkUI messages.

 @Chester, I'm not sure what has gone wrong as there are many factors at
 play here. When you go the Resource Manager UI, does the application URL
 link point you to the same SparkUI address as indicated in the logs? If so,
 this is the correct behavior. However, I believe the redirect error has
 little to do with Spark itself, but more to do with how you set up the
 cluster. I have actually run into this myself, but I haven't found a
 workaround. Let me know if you find anything.




 2014-07-07 12:07 GMT-07:00 Chester Chen ches...@alpinenow.com:

 As Andrew explained, the port is random rather than 4040, as the the spark
 driver is started in Application Master and the port is random selected.


 But I have the similar UI issue. I am running Yarn Cluster mode against
 my local CDH5 cluster.

 The log states
 14/07/07 11:59:29 INFO ui.SparkUI: Started SparkUI at
 http://10.0.0.63:58750

 


 but when you client the spark UI link (ApplicationMaster or

 http://10.0.0.63:58750), I will got a 404 with the redirect URI



  http://localhost/proxy/application_1404443455764_0010/



 Looking at the Spark code, notice that the proxy is reallya variable to 
 get the proxy at the yarn-site.xml http address. But when I specified the 
 value at yarn-site.xml, it still doesn't work for me.



 Oddly enough, it works for my co-worker on Pivotal HD cluster, therefore I 
 am still looking what's the difference in terms of cluster setup or 
 something else.


 Chester





 On Mon, Jul 7, 2014 at 11:42 AM, Andrew Or and...@databricks.com wrote:

 I will assume that you are running in yarn-cluster mode. Because the
 driver is launched in one of the containers, it doesn't make sense to
 expose port 4040 for the node that contains the container. (Imagine if
 multiple driver containers are launched on the same node. This will cause a
 port collision). If you're launching Spark from a gateway node that is
 physically near your worker nodes, then you can just launch your
 application in yarn-client mode, in which case the SparkUI will always be
 started on port 4040 on the node that you ran spark-submit on. The reason
 why sometimes you see the red text is because it appears only on the driver
 containers, not the executor containers. This is because SparkUI belongs to
 the SparkContext, which only exists on the driver.

 Andrew


 2014-07-07 11:20 GMT-07:00 Yan Fang yanfang...@gmail.com:

 Hi guys,

 Not sure if you  have similar issues. Did not find relevant tickets in
 JIRA. When I deploy the Spark Streaming to YARN, I have following two
 issues:

 1. The UI port is random. It is not default 4040. I have to look at the
 container's log to check the UI port. Is this suppose to be this way?

 2. Most of the time, the UI does not work. The difference between logs
 are (I ran the same program):






 *14/07/03 11:38:50 INFO spark.HttpServer: Starting HTTP Server14/07/03
 11:38:50 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/07/03 11:38:50 INFO
 server.AbstractConnector: Started SocketConnector@0.0.0.0:12026
 http://SocketConnector@0.0.0.0:1202614/07/03 11:38:51 INFO
 executor.CoarseGrainedExecutorBackend: Got assigned task 0 14/07/03
 11:38:51 INFO executor.Executor: Running task ID 0...*

 14/07/02 16:55:32 INFO spark.HttpServer: Starting HTTP Server
 14/07/02 16:55:32 INFO server.Server: jetty-8.y.z-SNAPSHOT
 14/07/02 16:55:32 INFO server.AbstractConnector: Started
 SocketConnector@0.0.0.0:14211




 *14/07/02 16:55:32 INFO ui.JettyUtils: Adding filter:
 org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter14/07/02 16:55:32
 INFO server.Server: jetty-8.y.z-SNAPSHOT14/07/02 16:55:32 INFO
 server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:21867
 http://SelectChannelConnector@0.0.0.0:21867 14/07/02 16:55:32 INFO
 ui.SparkUI: Started SparkUI at http://myNodeName:21867
 http://myNodeName:2186714/07/02 16:55:32 INFO
 cluster.YarnClusterScheduler: Created YarnClusterScheduler*

 When the red part comes, the UI works sometime. Any ideas? Thank you.

 Best,

 Fang, Yan
 yanfang...@gmail.com
 +1 (206)