[ https://issues.apache.org/jira/browse/SPARK-20853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16114436#comment-16114436 ]
Trevor McKay commented on SPARK-20853: -------------------------------------- @Josh Bacon, Hi Josh, we tracked this down independently and someone else found the same thing and put up a fix, referenced in the "cloned by" SPARK-21176 above. The issue is actually how the Jetty thread pool associated with the reverse proxy handlers is being built. The more processors your system has, the quicker your threadpool will be exhausted, in a nutshell. I ran on a 40 core system and so 40 / 2 * (number of nodes) exceeded the 200 count pool at 10 nodes. If you run the master on a single core, you can have 200 workers before you blow up I believe (that is spark standalone). Cheers! > spark.ui.reverseProxy=true leads to hanging communication to master > ------------------------------------------------------------------- > > Key: SPARK-20853 > URL: https://issues.apache.org/jira/browse/SPARK-20853 > Project: Spark > Issue Type: Bug > Components: Web UI > Affects Versions: 2.1.0 > Environment: ppc64le GNU/Linux, POWER8, only master node is reachable > externally other nodes are in an internal network > Reporter: Benno Staebler > Labels: network, web-ui > > When *reverse proxy is enabled* > {quote} > spark.ui.reverseProxy=true > spark.ui.reverseProxyUrl=/ > {quote} > first of all any invocation of the spark master Web UI hangs forever locally > (e.g. http://192.168.10.16:25001) and via external URL without any data > received. > One, sometimes two spark applications succeed without error and than workers > start throwing exceptions: > {quote} > Caused by: java.io.IOException: Failed to connect to /192.168.10.16:25050 > {quote} > The application dies during creation of SparkContext: > {quote} > 2017-05-22 16:11:23 INFO StandaloneAppClient$ClientEndpoint:54 - Connecting > to master spark://node0101:25000... > 2017-05-22 16:11:23 INFO TransportClientFactory:254 - Successfully created > connection to node0101/192.168.10.16:25000 after 169 ms (132 ms spent in > bootstraps) > 2017-05-22 16:11:43 INFO StandaloneAppClient$ClientEndpoint:54 - Connecting > to master spark://node0101:25000... > 2017-05-22 16:12:03 INFO StandaloneAppClient$ClientEndpoint:54 - Connecting > to master spark://node0101:25000... > 2017-05-22 16:12:23 ERROR StandaloneSchedulerBackend:70 - Application has > been killed. Reason: All masters are unresponsive! Giving up. > 2017-05-22 16:12:23 WARN StandaloneSchedulerBackend:66 - Application ID is > not initialized yet. > 2017-05-22 16:12:23 INFO Utils:54 - Successfully started service > 'org.apache.spark.network.netty.NettyBlockTransferService' on port 25056. > ..... > Caused by: java.lang.IllegalArgumentException: requirement failed: Can only > call getServletHandlers on a running MetricsSystem > {quote} > *This definitively does not happen without reverse proxy enabled!* -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org