Hi, spark users.

When running a spark application with lots of executors(300+), I see following 
failures:

java.net.SocketTimeoutException: Read timed out      at 
java.net.SocketInputStream.socketRead0(Native Method)      at 
java.net.SocketInputStream.read(SocketInputStream.java:152)      at 
java.net.SocketInputStream.read(SocketInputStream.java:122)      at 
java.io.BufferedInputStream.fill(BufferedInputStream.java:235)      at 
java.io.BufferedInputStream.read1(BufferedInputStream.java:275)      at 
java.io.BufferedInputStream.read(BufferedInputStream.java:334)      at 
sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:690)      at 
sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)      at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1324)
      at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:583)      at 
org.apache.spark.util.Utils$.fetchFile(Utils.scala:421)      at 
org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:356)
      at 
org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:353)
      at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
      at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)     
 at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) 
     at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)      
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)      at 
scala.collection.mutable.HashMap.foreach(HashMap.scala:98)      at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)  
    at 
org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:353)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)  
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
     at java.lang.Thread.run(Thread.java:745)

When I reduce the number of executors, the spark app runs fine. From the stack 
trace, it looks like that multiple executors requesting downloading 
dependencies at the same time is causing driver to timeout?

Anyone experienced similar issues or has any suggestions?

Thanks
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to