SocketTimeout only when launching lots of executors

2015-03-22 Thread Tianshuo Deng
Hi, spark users.

When running a spark application with lots of executors(300+), I see following 
failures:

java.net.SocketTimeoutException: Read timed out  at 
java.net.SocketInputStream.socketRead0(Native Method)  at 
java.net.SocketInputStream.read(SocketInputStream.java:152)  at 
java.net.SocketInputStream.read(SocketInputStream.java:122)  at 
java.io.BufferedInputStream.fill(BufferedInputStream.java:235)  at 
java.io.BufferedInputStream.read1(BufferedInputStream.java:275)  at 
java.io.BufferedInputStream.read(BufferedInputStream.java:334)  at 
sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:690)  at 
sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)  at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1324)
  at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:583)  at 
org.apache.spark.util.Utils$.fetchFile(Utils.scala:421)  at 
org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:356)
  at 
org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$6.apply(Executor.scala:353)
  at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
  at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) 
 at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) 
 at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)  
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)  at 
scala.collection.mutable.HashMap.foreach(HashMap.scala:98)  at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)  
at 
org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:353)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)  
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
 at java.lang.Thread.run(Thread.java:745)

When I reduce the number of executors, the spark app runs fine. From the stack 
trace, it looks like that multiple executors requesting downloading 
dependencies at the same time is causing driver to timeout?

Anyone experienced similar issues or has any suggestions?

Thanks
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Lost task - connection closed

2015-02-17 Thread Tianshuo Deng
Hi, Thanks for the reponse.
I discovered my problem was that some of the executors got OOM, tracing
down the logs of executors helps discovering the problem. Usually the log
from the driver do not reflect the OOM error and therefore causes
confusions among users.

This is just the discoveries on my side, not sure if OP was having the same
problem though

On Wed, Feb 11, 2015 at 12:03 AM, Arush Kharbanda 
ar...@sigmoidanalytics.com wrote:

 Hi

 Can you share the code you are trying to run.

 Thanks
 Arush

 On Wed, Feb 11, 2015 at 9:12 AM, Tianshuo Deng td...@twitter.com.invalid
 wrote:

 I have seen the same problem, It causes some tasks to fail, but not the
 whole job to fail.
 Hope someone could shed some light on what could be the cause of this.

 On Mon, Jan 26, 2015 at 9:49 AM, Aaron Davidson ilike...@gmail.com
 wrote:

 It looks like something weird is going on with your object
 serialization, perhaps a funny form of self-reference which is not detected
 by ObjectOutputStream's typical loop avoidance. That, or you have some data
 structure like a linked list with a parent pointer and you have many
 thousand elements.

 Assuming the stack trace is coming from an executor, it is probably a
 problem with the objects you're sending back as results, so I would
 carefully examine these and maybe try serializing some using
 ObjectOutputStream manually.

 If your program looks like
 foo.map { row = doComplexOperation(row) }.take(10)

 you can also try changing it to
 foo.map { row = doComplexOperation(row); 1 }.take(10)

 to avoid serializing the result of that complex operation, which should
 help narrow down where exactly the problematic objects are coming from.

 On Mon, Jan 26, 2015 at 8:31 AM, octavian.ganea 
 octavian.ga...@inf.ethz.ch wrote:

 Here is the first error I get at the executors:

 15/01/26 17:27:04 ERROR ExecutorUncaughtExceptionHandler: Uncaught
 exception
 in thread Thread[handle-message-executor-16,5,main]
 java.lang.StackOverflowError
 at

 java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876)
 at

 java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1840)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177

Re: Lost task - connection closed

2015-02-17 Thread Tianshuo Deng
Hi, Thanks for the reponse.
I discovered my problem was that some of the executors got OOM, tracing
down the logs of executors helps discovering the problem. Usually the log
from the driver do not reflect the OOM error and therefore causes
confusions among users.

This is just the discoveries on my side, not sure if OP was having the same
problem though

On Wed, Feb 11, 2015 at 12:03 AM, Arush Kharbanda 
ar...@sigmoidanalytics.com wrote:

 Hi

 Can you share the code you are trying to run.

 Thanks
 Arush

 On Wed, Feb 11, 2015 at 9:12 AM, Tianshuo Deng td...@twitter.com.invalid
 wrote:

 I have seen the same problem, It causes some tasks to fail, but not the
 whole job to fail.
 Hope someone could shed some light on what could be the cause of this.

 On Mon, Jan 26, 2015 at 9:49 AM, Aaron Davidson ilike...@gmail.com
 wrote:

 It looks like something weird is going on with your object
 serialization, perhaps a funny form of self-reference which is not detected
 by ObjectOutputStream's typical loop avoidance. That, or you have some data
 structure like a linked list with a parent pointer and you have many
 thousand elements.

 Assuming the stack trace is coming from an executor, it is probably a
 problem with the objects you're sending back as results, so I would
 carefully examine these and maybe try serializing some using
 ObjectOutputStream manually.

 If your program looks like
 foo.map { row = doComplexOperation(row) }.take(10)

 you can also try changing it to
 foo.map { row = doComplexOperation(row); 1 }.take(10)

 to avoid serializing the result of that complex operation, which should
 help narrow down where exactly the problematic objects are coming from.

 On Mon, Jan 26, 2015 at 8:31 AM, octavian.ganea 
 octavian.ga...@inf.ethz.ch wrote:

 Here is the first error I get at the executors:

 15/01/26 17:27:04 ERROR ExecutorUncaughtExceptionHandler: Uncaught
 exception
 in thread Thread[handle-message-executor-16,5,main]
 java.lang.StackOverflowError
 at

 java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876)
 at

 java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1840)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 at

 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
 at
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
 at

 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
 at
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177

[mllib] GradientDescent requires huge memory for storing weight vector

2015-01-12 Thread Tianshuo Deng
Hi,
Currently in GradientDescent.scala, weights is constructed as a dense
vector:

initialWeights = Vectors.dense(new Array[Double](numFeatures))

And the numFeatures is determined in the loadLibSVMFile as the max index of
features.

But in the case of using hash function to compute feature index, it results
in a huge dense vector being generated taking lots of memory space.

Any suggestions?