Re: Issues in opening UI when running Spark Streaming in YARN
I will assume that you are running in yarn-cluster mode. Because the driver is launched in one of the containers, it doesn't make sense to expose port 4040 for the node that contains the container. (Imagine if multiple driver containers are launched on the same node. This will cause a port collision). If you're launching Spark from a gateway node that is physically near your worker nodes, then you can just launch your application in yarn-client mode, in which case the SparkUI will always be started on port 4040 on the node that you ran spark-submit on. The reason why sometimes you see the red text is because it appears only on the driver containers, not the executor containers. This is because SparkUI belongs to the SparkContext, which only exists on the driver. Andrew 2014-07-07 11:20 GMT-07:00 Yan Fang yanfang...@gmail.com: Hi guys, Not sure if you have similar issues. Did not find relevant tickets in JIRA. When I deploy the Spark Streaming to YARN, I have following two issues: 1. The UI port is random. It is not default 4040. I have to look at the container's log to check the UI port. Is this suppose to be this way? 2. Most of the time, the UI does not work. The difference between logs are (I ran the same program): *14/07/03 11:38:50 INFO spark.HttpServer: Starting HTTP Server14/07/03 11:38:50 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/07/03 11:38:50 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:12026 http://SocketConnector@0.0.0.0:1202614/07/03 11:38:51 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 0 14/07/03 11:38:51 INFO executor.Executor: Running task ID 0...* 14/07/02 16:55:32 INFO spark.HttpServer: Starting HTTP Server 14/07/02 16:55:32 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/07/02 16:55:32 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:14211 *14/07/02 16:55:32 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter14/07/02 16:55:32 INFO server.Server: jetty-8.y.z-SNAPSHOT14/07/02 16:55:32 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:21867 http://SelectChannelConnector@0.0.0.0:21867 14/07/02 16:55:32 INFO ui.SparkUI: Started SparkUI at http://myNodeName:21867 http://myNodeName:2186714/07/02 16:55:32 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler* When the red part comes, the UI works sometime. Any ideas? Thank you. Best, Fang, Yan yanfang...@gmail.com +1 (206) 849-4108
Re: Issues in opening UI when running Spark Streaming in YARN
Hi Andrew, Thanks for the quick reply. It works with the yarn-client mode. One question about the yarn-cluster mode: actually I was checking the AM for the log, since the spark driver is running in the AM, the UI should also work, right? But that is not true in my case. Best, Fang, Yan yanfang...@gmail.com +1 (206) 849-4108 On Mon, Jul 7, 2014 at 11:42 AM, Andrew Or and...@databricks.com wrote: I will assume that you are running in yarn-cluster mode. Because the driver is launched in one of the containers, it doesn't make sense to expose port 4040 for the node that contains the container. (Imagine if multiple driver containers are launched on the same node. This will cause a port collision). If you're launching Spark from a gateway node that is physically near your worker nodes, then you can just launch your application in yarn-client mode, in which case the SparkUI will always be started on port 4040 on the node that you ran spark-submit on. The reason why sometimes you see the red text is because it appears only on the driver containers, not the executor containers. This is because SparkUI belongs to the SparkContext, which only exists on the driver. Andrew 2014-07-07 11:20 GMT-07:00 Yan Fang yanfang...@gmail.com: Hi guys, Not sure if you have similar issues. Did not find relevant tickets in JIRA. When I deploy the Spark Streaming to YARN, I have following two issues: 1. The UI port is random. It is not default 4040. I have to look at the container's log to check the UI port. Is this suppose to be this way? 2. Most of the time, the UI does not work. The difference between logs are (I ran the same program): *14/07/03 11:38:50 INFO spark.HttpServer: Starting HTTP Server14/07/03 11:38:50 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/07/03 11:38:50 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:12026 http://SocketConnector@0.0.0.0:1202614/07/03 11:38:51 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 0 14/07/03 11:38:51 INFO executor.Executor: Running task ID 0...* 14/07/02 16:55:32 INFO spark.HttpServer: Starting HTTP Server 14/07/02 16:55:32 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/07/02 16:55:32 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:14211 *14/07/02 16:55:32 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter14/07/02 16:55:32 INFO server.Server: jetty-8.y.z-SNAPSHOT14/07/02 16:55:32 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:21867 http://SelectChannelConnector@0.0.0.0:21867 14/07/02 16:55:32 INFO ui.SparkUI: Started SparkUI at http://myNodeName:21867 http://myNodeName:2186714/07/02 16:55:32 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler* When the red part comes, the UI works sometime. Any ideas? Thank you. Best, Fang, Yan yanfang...@gmail.com +1 (206) 849-4108
Re: Issues in opening UI when running Spark Streaming in YARN
@Yan, the UI should still work. As long as you look into the container that launches the driver, you will find the SparkUI address and port. Note that in yarn-cluster mode the Spark driver doesn't actually run in the Application Manager; just like the executors, it runs in a container that is launched by the Resource Manager after the Application Master requests the container resources. In contrast, in yarn-client mode, your driver is not launched in a container, but in the client process that launched your application (i.e. spark-submit), so the stdout of this program directly contains the SparkUI messages. @Chester, I'm not sure what has gone wrong as there are many factors at play here. When you go the Resource Manager UI, does the application URL link point you to the same SparkUI address as indicated in the logs? If so, this is the correct behavior. However, I believe the redirect error has little to do with Spark itself, but more to do with how you set up the cluster. I have actually run into this myself, but I haven't found a workaround. Let me know if you find anything. 2014-07-07 12:07 GMT-07:00 Chester Chen ches...@alpinenow.com: As Andrew explained, the port is random rather than 4040, as the the spark driver is started in Application Master and the port is random selected. But I have the similar UI issue. I am running Yarn Cluster mode against my local CDH5 cluster. The log states 14/07/07 11:59:29 INFO ui.SparkUI: Started SparkUI at http://10.0.0.63:58750 but when you client the spark UI link (ApplicationMaster or http://10.0.0.63:58750), I will got a 404 with the redirect URI http://localhost/proxy/application_1404443455764_0010/ Looking at the Spark code, notice that the proxy is reallya variable to get the proxy at the yarn-site.xml http address. But when I specified the value at yarn-site.xml, it still doesn't work for me. Oddly enough, it works for my co-worker on Pivotal HD cluster, therefore I am still looking what's the difference in terms of cluster setup or something else. Chester On Mon, Jul 7, 2014 at 11:42 AM, Andrew Or and...@databricks.com wrote: I will assume that you are running in yarn-cluster mode. Because the driver is launched in one of the containers, it doesn't make sense to expose port 4040 for the node that contains the container. (Imagine if multiple driver containers are launched on the same node. This will cause a port collision). If you're launching Spark from a gateway node that is physically near your worker nodes, then you can just launch your application in yarn-client mode, in which case the SparkUI will always be started on port 4040 on the node that you ran spark-submit on. The reason why sometimes you see the red text is because it appears only on the driver containers, not the executor containers. This is because SparkUI belongs to the SparkContext, which only exists on the driver. Andrew 2014-07-07 11:20 GMT-07:00 Yan Fang yanfang...@gmail.com: Hi guys, Not sure if you have similar issues. Did not find relevant tickets in JIRA. When I deploy the Spark Streaming to YARN, I have following two issues: 1. The UI port is random. It is not default 4040. I have to look at the container's log to check the UI port. Is this suppose to be this way? 2. Most of the time, the UI does not work. The difference between logs are (I ran the same program): *14/07/03 11:38:50 INFO spark.HttpServer: Starting HTTP Server14/07/03 11:38:50 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/07/03 11:38:50 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:12026 http://SocketConnector@0.0.0.0:1202614/07/03 11:38:51 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 0 14/07/03 11:38:51 INFO executor.Executor: Running task ID 0...* 14/07/02 16:55:32 INFO spark.HttpServer: Starting HTTP Server 14/07/02 16:55:32 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/07/02 16:55:32 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:14211 *14/07/02 16:55:32 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter14/07/02 16:55:32 INFO server.Server: jetty-8.y.z-SNAPSHOT14/07/02 16:55:32 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:21867 http://SelectChannelConnector@0.0.0.0:21867 14/07/02 16:55:32 INFO ui.SparkUI: Started SparkUI at http://myNodeName:21867 http://myNodeName:2186714/07/02 16:55:32 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler* When the red part comes, the UI works sometime. Any ideas? Thank you. Best, Fang, Yan yanfang...@gmail.com +1 (206) 849-4108
Re: Issues in opening UI when running Spark Streaming in YARN
Thank you, Andrew. That makes sense for me now. I was confused by In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster in http://spark.apache.org/docs/latest/running-on-yarn.html . After you explanation, it's clear now. Thank you. Best, Fang, Yan yanfang...@gmail.com +1 (206) 849-4108 On Mon, Jul 7, 2014 at 1:07 PM, Andrew Or and...@databricks.com wrote: @Yan, the UI should still work. As long as you look into the container that launches the driver, you will find the SparkUI address and port. Note that in yarn-cluster mode the Spark driver doesn't actually run in the Application Manager; just like the executors, it runs in a container that is launched by the Resource Manager after the Application Master requests the container resources. In contrast, in yarn-client mode, your driver is not launched in a container, but in the client process that launched your application (i.e. spark-submit), so the stdout of this program directly contains the SparkUI messages. @Chester, I'm not sure what has gone wrong as there are many factors at play here. When you go the Resource Manager UI, does the application URL link point you to the same SparkUI address as indicated in the logs? If so, this is the correct behavior. However, I believe the redirect error has little to do with Spark itself, but more to do with how you set up the cluster. I have actually run into this myself, but I haven't found a workaround. Let me know if you find anything. 2014-07-07 12:07 GMT-07:00 Chester Chen ches...@alpinenow.com: As Andrew explained, the port is random rather than 4040, as the the spark driver is started in Application Master and the port is random selected. But I have the similar UI issue. I am running Yarn Cluster mode against my local CDH5 cluster. The log states 14/07/07 11:59:29 INFO ui.SparkUI: Started SparkUI at http://10.0.0.63:58750 but when you client the spark UI link (ApplicationMaster or http://10.0.0.63:58750), I will got a 404 with the redirect URI http://localhost/proxy/application_1404443455764_0010/ Looking at the Spark code, notice that the proxy is reallya variable to get the proxy at the yarn-site.xml http address. But when I specified the value at yarn-site.xml, it still doesn't work for me. Oddly enough, it works for my co-worker on Pivotal HD cluster, therefore I am still looking what's the difference in terms of cluster setup or something else. Chester On Mon, Jul 7, 2014 at 11:42 AM, Andrew Or and...@databricks.com wrote: I will assume that you are running in yarn-cluster mode. Because the driver is launched in one of the containers, it doesn't make sense to expose port 4040 for the node that contains the container. (Imagine if multiple driver containers are launched on the same node. This will cause a port collision). If you're launching Spark from a gateway node that is physically near your worker nodes, then you can just launch your application in yarn-client mode, in which case the SparkUI will always be started on port 4040 on the node that you ran spark-submit on. The reason why sometimes you see the red text is because it appears only on the driver containers, not the executor containers. This is because SparkUI belongs to the SparkContext, which only exists on the driver. Andrew 2014-07-07 11:20 GMT-07:00 Yan Fang yanfang...@gmail.com: Hi guys, Not sure if you have similar issues. Did not find relevant tickets in JIRA. When I deploy the Spark Streaming to YARN, I have following two issues: 1. The UI port is random. It is not default 4040. I have to look at the container's log to check the UI port. Is this suppose to be this way? 2. Most of the time, the UI does not work. The difference between logs are (I ran the same program): *14/07/03 11:38:50 INFO spark.HttpServer: Starting HTTP Server14/07/03 11:38:50 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/07/03 11:38:50 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:12026 http://SocketConnector@0.0.0.0:1202614/07/03 11:38:51 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 0 14/07/03 11:38:51 INFO executor.Executor: Running task ID 0...* 14/07/02 16:55:32 INFO spark.HttpServer: Starting HTTP Server 14/07/02 16:55:32 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/07/02 16:55:32 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:14211 *14/07/02 16:55:32 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter14/07/02 16:55:32 INFO server.Server: jetty-8.y.z-SNAPSHOT14/07/02 16:55:32 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:21867 http://SelectChannelConnector@0.0.0.0:21867 14/07/02 16:55:32 INFO ui.SparkUI: Started SparkUI at http://myNodeName:21867 http://myNodeName:2186714/07/02 16:55:32 INFO cluster.YarnClusterScheduler: Created
Re: Issues in opening UI when running Spark Streaming in YARN
@Andrew Yes, the link point to the same redirected http://localhost/proxy/application_1404443455764_0010/ I suspect something todo with the cluster setup. I will let you know once I found something. Chester On Mon, Jul 7, 2014 at 1:07 PM, Andrew Or and...@databricks.com wrote: @Yan, the UI should still work. As long as you look into the container that launches the driver, you will find the SparkUI address and port. Note that in yarn-cluster mode the Spark driver doesn't actually run in the Application Manager; just like the executors, it runs in a container that is launched by the Resource Manager after the Application Master requests the container resources. In contrast, in yarn-client mode, your driver is not launched in a container, but in the client process that launched your application (i.e. spark-submit), so the stdout of this program directly contains the SparkUI messages. @Chester, I'm not sure what has gone wrong as there are many factors at play here. When you go the Resource Manager UI, does the application URL link point you to the same SparkUI address as indicated in the logs? If so, this is the correct behavior. However, I believe the redirect error has little to do with Spark itself, but more to do with how you set up the cluster. I have actually run into this myself, but I haven't found a workaround. Let me know if you find anything. 2014-07-07 12:07 GMT-07:00 Chester Chen ches...@alpinenow.com: As Andrew explained, the port is random rather than 4040, as the the spark driver is started in Application Master and the port is random selected. But I have the similar UI issue. I am running Yarn Cluster mode against my local CDH5 cluster. The log states 14/07/07 11:59:29 INFO ui.SparkUI: Started SparkUI at http://10.0.0.63:58750 but when you client the spark UI link (ApplicationMaster or http://10.0.0.63:58750), I will got a 404 with the redirect URI http://localhost/proxy/application_1404443455764_0010/ Looking at the Spark code, notice that the proxy is reallya variable to get the proxy at the yarn-site.xml http address. But when I specified the value at yarn-site.xml, it still doesn't work for me. Oddly enough, it works for my co-worker on Pivotal HD cluster, therefore I am still looking what's the difference in terms of cluster setup or something else. Chester On Mon, Jul 7, 2014 at 11:42 AM, Andrew Or and...@databricks.com wrote: I will assume that you are running in yarn-cluster mode. Because the driver is launched in one of the containers, it doesn't make sense to expose port 4040 for the node that contains the container. (Imagine if multiple driver containers are launched on the same node. This will cause a port collision). If you're launching Spark from a gateway node that is physically near your worker nodes, then you can just launch your application in yarn-client mode, in which case the SparkUI will always be started on port 4040 on the node that you ran spark-submit on. The reason why sometimes you see the red text is because it appears only on the driver containers, not the executor containers. This is because SparkUI belongs to the SparkContext, which only exists on the driver. Andrew 2014-07-07 11:20 GMT-07:00 Yan Fang yanfang...@gmail.com: Hi guys, Not sure if you have similar issues. Did not find relevant tickets in JIRA. When I deploy the Spark Streaming to YARN, I have following two issues: 1. The UI port is random. It is not default 4040. I have to look at the container's log to check the UI port. Is this suppose to be this way? 2. Most of the time, the UI does not work. The difference between logs are (I ran the same program): *14/07/03 11:38:50 INFO spark.HttpServer: Starting HTTP Server14/07/03 11:38:50 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/07/03 11:38:50 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:12026 http://SocketConnector@0.0.0.0:1202614/07/03 11:38:51 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 0 14/07/03 11:38:51 INFO executor.Executor: Running task ID 0...* 14/07/02 16:55:32 INFO spark.HttpServer: Starting HTTP Server 14/07/02 16:55:32 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/07/02 16:55:32 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:14211 *14/07/02 16:55:32 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter14/07/02 16:55:32 INFO server.Server: jetty-8.y.z-SNAPSHOT14/07/02 16:55:32 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:21867 http://SelectChannelConnector@0.0.0.0:21867 14/07/02 16:55:32 INFO ui.SparkUI: Started SparkUI at http://myNodeName:21867 http://myNodeName:2186714/07/02 16:55:32 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler* When the red part comes, the UI works sometime. Any ideas? Thank you. Best, Fang, Yan yanfang...@gmail.com +1 (206)