[ https://issues.apache.org/jira/browse/SPARK-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ilya Ostrovskiy updated SPARK-13960: ------------------------------------ Description: There is no option to specify which hostname/IP address the jar/file server listens on, and rather than using "spark.driver.host" if specified, the jar/file server will listen on the system's primary IP address. This is an issue when submitting an application in client mode on a machine with two NICs connected to two different networks. Steps to reproduce: 1) Have a cluster in a remote network, whose master is on 192.168.255.10 2) Have a machine at another location, with a "primary" IP address of 192.168.1.2, connected to the "remote network" as well, with the IP address 192.168.255.250. Let's call this the "client machine". 3) Ensure every machine in the spark cluster at the remote location can ping 192.168.255.250 and reach the client machine via that address. 4) On the client: {noformat} spark-submit --deploy-mode client --conf "spark.driver.host=192.168.255.250" --master spark://192.168.255.10:7077 --class <any valid spark application> <local jar with spark application> <whatever args you want> {noformat} 5) Navigate to http://192.168.255.250:4040/ and ensure that executors from the remote cluster have found the driver on the client machine 6) Navigate to http://192.168.255.250:4040/environment/, and scroll to the bottom 7) Observe that the JAR you specified in Step 4 will be listed under http://192.168.1.2:<random port>/jars/<your jar here>.jar 8) Enjoy this stack trace periodically appearing on the client machine when the nodes in the remote cluster cant connect to 192.168.1.2 {noformat} 16/03/17 03:25:55 WARN TaskSetManager: Lost task 1.2 in stage 0.0 (TID 5, 172.17.74.1): java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.<init>(HttpClient.java:211) at sun.net.www.http.HttpClient.New(HttpClient.java:308) at sun.net.www.http.HttpClient.New(HttpClient.java:326) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169) at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933) at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:588) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:381) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:405) at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:397) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:397) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:193) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {noformat} was: There is no option to specify which hostname/IP address the jar/file server listens on, and rather than using "spark.driver.host" if specified, the jar/file server will listen on the system's primary IP address. This is an issue when submitting an application in client mode on a machine with two NICs connected to two different networks. Steps to reproduce: 1) Have a cluster in a remote network, whose master is on 192.168.255.10 2) Have a machine at another location, with a "primary" IP address of 192.168.1.2, connected to the "remote network" as well, with the IP address 192.168.255.250. Let's call this the "client machine". 3) Ensure every machine in the spark cluster at the remote location can ping 192.168.255.250 and reach the client machine via that address. 4) On the client: {noformat} spark-submit --deploy-mode client --conf "spark.driver.host=192.168.255.250" --master spark://192.168.255.10:7077 --class <any valid spark application> <local jar with spark application> <whatever args you want> {noformat} 5) Navigate to http://192.168.255.250:4040/ and ensure that executors from the remote cluster have found the driver on the client machine 6) Navigate to http://192.168.255.250:4040/environment/, and scroll to the bottom 7) Observe that the JAR you specified in Step 4 will be listed under http://192.168.1.2:<random port>/jars/<your jar here>.jar > HTTP-based JAR Server doesn't respect spark.driver.host and there is no > "spark.fileserver.host" option > ------------------------------------------------------------------------------------------------------ > > Key: SPARK-13960 > URL: https://issues.apache.org/jira/browse/SPARK-13960 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Submit > Affects Versions: 1.6.1 > Environment: Any system with more than one IP address > Reporter: Ilya Ostrovskiy > > There is no option to specify which hostname/IP address the jar/file server > listens on, and rather than using "spark.driver.host" if specified, the > jar/file server will listen on the system's primary IP address. This is an > issue when submitting an application in client mode on a machine with two > NICs connected to two different networks. > Steps to reproduce: > 1) Have a cluster in a remote network, whose master is on 192.168.255.10 > 2) Have a machine at another location, with a "primary" IP address of > 192.168.1.2, connected to the "remote network" as well, with the IP address > 192.168.255.250. Let's call this the "client machine". > 3) Ensure every machine in the spark cluster at the remote location can ping > 192.168.255.250 and reach the client machine via that address. > 4) On the client: > {noformat} > spark-submit --deploy-mode client --conf "spark.driver.host=192.168.255.250" > --master spark://192.168.255.10:7077 --class <any valid spark application> > <local jar with spark application> <whatever args you want> > {noformat} > 5) Navigate to http://192.168.255.250:4040/ and ensure that executors from > the remote cluster have found the driver on the client machine > 6) Navigate to http://192.168.255.250:4040/environment/, and scroll to the > bottom > 7) Observe that the JAR you specified in Step 4 will be listed under > http://192.168.1.2:<random port>/jars/<your jar here>.jar > 8) Enjoy this stack trace periodically appearing on the client machine when > the nodes in the remote cluster cant connect to 192.168.1.2 > {noformat} > 16/03/17 03:25:55 WARN TaskSetManager: Lost task 1.2 in stage 0.0 (TID 5, > 172.17.74.1): java.net.SocketTimeoutException: connect timed out > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:589) > at sun.net.NetworkClient.doConnect(NetworkClient.java:175) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) > at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) > at sun.net.www.http.HttpClient.<init>(HttpClient.java:211) > at sun.net.www.http.HttpClient.New(HttpClient.java:308) > at sun.net.www.http.HttpClient.New(HttpClient.java:326) > at > sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169) > at > sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105) > at > sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999) > at > sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933) > at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:588) > at org.apache.spark.util.Utils$.fetchFile(Utils.scala:381) > at > org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:405) > at > org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:397) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) > at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:397) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:193) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org