Re: Shuffle issues in the current master

2014-10-25 Thread DB Tsai
Hi Andrew,

We were running the master after SPARK-3613. Will give another shot
against the current master while Josh fixed couple issues in shuffle.
Thanks.

Sincerely,

DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Thu, Oct 23, 2014 at 10:17 AM, Andrew Or and...@databricks.com wrote:
 To add to Aaron's response, `spark.shuffle.consolidateFiles` only applies to
 hash-based shuffle, so you shouldn't have to set it for sort-based shuffle.
 And yes, since you changed neither `spark.shuffle.compress` nor
 `spark.shuffle.spill.compress` you can't possibly have run into what #2890
 fixes.

 I'm assuming you're running master? Was it before or after this commit:
 https://github.com/apache/spark/commit/6b79bfb42580b6bd4c4cd99fb521534a94150693?

 -Andrew

 2014-10-22 16:37 GMT-07:00 Aaron Davidson ilike...@gmail.com:

 You may be running into this issue:
 https://issues.apache.org/jira/browse/SPARK-4019

 You could check by having 2000 or fewer reduce partitions.

 On Wed, Oct 22, 2014 at 1:48 PM, DB Tsai dbt...@dbtsai.com wrote:

 PS, sorry for spamming the mailing list. Based my knowledge, both
 spark.shuffle.spill.compress and spark.shuffle.compress are default to
 true, so in theory, we should not run into this issue if we don't
 change any setting. Is there any other big we run into?

 Thanks.

 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Wed, Oct 22, 2014 at 1:37 PM, DB Tsai dbt...@dbtsai.com wrote:
  Or can it be solved by setting both of the following setting into true
  for now?
 
  spark.shuffle.spill.compress true
  spark.shuffle.compress ture
 
  Sincerely,
 
  DB Tsai
  ---
  My Blog: https://www.dbtsai.com
  LinkedIn: https://www.linkedin.com/in/dbtsai
 
 
  On Wed, Oct 22, 2014 at 1:34 PM, DB Tsai dbt...@dbtsai.com wrote:
  It seems that this issue should be addressed by
  https://github.com/apache/spark/pull/2890 ? Am I right?
 
  Sincerely,
 
  DB Tsai
  ---
  My Blog: https://www.dbtsai.com
  LinkedIn: https://www.linkedin.com/in/dbtsai
 
 
  On Wed, Oct 22, 2014 at 11:54 AM, DB Tsai dbt...@dbtsai.com wrote:
  Hi all,
 
  With SPARK-3948, the exception in Snappy PARSING_ERROR is gone, but
  I've another exception now. I've no clue about what's going on; does
  anyone run into similar issue? Thanks.
 
  This is the configuration I use.
  spark.rdd.compress true
  spark.shuffle.consolidateFiles true
  spark.shuffle.manager SORT
  spark.akka.frameSize 128
  spark.akka.timeout  600
  spark.core.connection.ack.wait.timeout  600
  spark.core.connection.auth.wait.timeout 300
 
 
  java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
 
  java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
 
  java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
  java.io.ObjectInputStream.init(ObjectInputStream.java:299)
 
  org.apache.spark.serializer.JavaDeserializationStream$$anon$1.init(JavaSerializer.scala:57)
 
  org.apache.spark.serializer.JavaDeserializationStream.init(JavaSerializer.scala:57)
 
  org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:95)
 
  org.apache.spark.storage.BlockManager.getLocalShuffleFromDisk(BlockManager.scala:351)
 
  org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196)
 
  org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196)
 
  org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243)
 
  org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52)
 
  scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 
  org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
 
  org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 
  org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:89)
 
  org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44)
 
  org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
 
  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
  org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
  org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 
  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
  org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
  org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 
  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
  

Re: Shuffle issues in the current master

2014-10-23 Thread Andrew Or
To add to Aaron's response, `spark.shuffle.consolidateFiles` only applies
to hash-based shuffle, so you shouldn't have to set it for sort-based
shuffle. And yes, since you changed neither `spark.shuffle.compress` nor
`spark.shuffle.spill.compress` you can't possibly have run into what #2890
fixes.

I'm assuming you're running master? Was it before or after this commit:
https://github.com/apache/spark/commit/6b79bfb42580b6bd4c4cd99fb521534a94150693
?

-Andrew

2014-10-22 16:37 GMT-07:00 Aaron Davidson ilike...@gmail.com:

 You may be running into this issue:
 https://issues.apache.org/jira/browse/SPARK-4019

 You could check by having 2000 or fewer reduce partitions.

 On Wed, Oct 22, 2014 at 1:48 PM, DB Tsai dbt...@dbtsai.com wrote:

 PS, sorry for spamming the mailing list. Based my knowledge, both
 spark.shuffle.spill.compress and spark.shuffle.compress are default to
 true, so in theory, we should not run into this issue if we don't
 change any setting. Is there any other big we run into?

 Thanks.

 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Wed, Oct 22, 2014 at 1:37 PM, DB Tsai dbt...@dbtsai.com wrote:
  Or can it be solved by setting both of the following setting into true
 for now?
 
  spark.shuffle.spill.compress true
  spark.shuffle.compress ture
 
  Sincerely,
 
  DB Tsai
  ---
  My Blog: https://www.dbtsai.com
  LinkedIn: https://www.linkedin.com/in/dbtsai
 
 
  On Wed, Oct 22, 2014 at 1:34 PM, DB Tsai dbt...@dbtsai.com wrote:
  It seems that this issue should be addressed by
  https://github.com/apache/spark/pull/2890 ? Am I right?
 
  Sincerely,
 
  DB Tsai
  ---
  My Blog: https://www.dbtsai.com
  LinkedIn: https://www.linkedin.com/in/dbtsai
 
 
  On Wed, Oct 22, 2014 at 11:54 AM, DB Tsai dbt...@dbtsai.com wrote:
  Hi all,
 
  With SPARK-3948, the exception in Snappy PARSING_ERROR is gone, but
  I've another exception now. I've no clue about what's going on; does
  anyone run into similar issue? Thanks.
 
  This is the configuration I use.
  spark.rdd.compress true
  spark.shuffle.consolidateFiles true
  spark.shuffle.manager SORT
  spark.akka.frameSize 128
  spark.akka.timeout  600
  spark.core.connection.ack.wait.timeout  600
  spark.core.connection.auth.wait.timeout 300
 
 
 java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
 
  
 java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
 
  java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
  java.io.ObjectInputStream.init(ObjectInputStream.java:299)
 
  
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.init(JavaSerializer.scala:57)
 
  
 org.apache.spark.serializer.JavaDeserializationStream.init(JavaSerializer.scala:57)
 
  
 org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:95)
 
  
 org.apache.spark.storage.BlockManager.getLocalShuffleFromDisk(BlockManager.scala:351)
 
  
 org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196)
 
  
 org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196)
 
  
 org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243)
 
  
 org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52)
  scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 
  
 org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
 
  
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 
  org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:89)
 
  
 org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44)
  org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
 
  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
  org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
  org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 
  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
  org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
  org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 
  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
  org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
  org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 
  org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  org.apache.spark.scheduler.Task.run(Task.scala:56)
 
  org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 
  
 

Shuffle issues in the current master

2014-10-22 Thread DB Tsai
Hi all,

With SPARK-3948, the exception in Snappy PARSING_ERROR is gone, but
I've another exception now. I've no clue about what's going on; does
anyone run into similar issue? Thanks.

This is the configuration I use.
spark.rdd.compress true
spark.shuffle.consolidateFiles true
spark.shuffle.manager SORT
spark.akka.frameSize 128
spark.akka.timeout  600
spark.core.connection.ack.wait.timeout  600
spark.core.connection.auth.wait.timeout 300

java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)

java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
java.io.ObjectInputStream.init(ObjectInputStream.java:299)

org.apache.spark.serializer.JavaDeserializationStream$$anon$1.init(JavaSerializer.scala:57)

org.apache.spark.serializer.JavaDeserializationStream.init(JavaSerializer.scala:57)

org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:95)

org.apache.spark.storage.BlockManager.getLocalShuffleFromDisk(BlockManager.scala:351)

org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196)

org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196)

org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243)

org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52)
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)

org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)

org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:89)

org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44)
org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)

org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)

org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:56)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:744)


Sincerely,

DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Shuffle issues in the current master

2014-10-22 Thread DB Tsai
It seems that this issue should be addressed by
https://github.com/apache/spark/pull/2890 ? Am I right?

Sincerely,

DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Wed, Oct 22, 2014 at 11:54 AM, DB Tsai dbt...@dbtsai.com wrote:
 Hi all,

 With SPARK-3948, the exception in Snappy PARSING_ERROR is gone, but
 I've another exception now. I've no clue about what's going on; does
 anyone run into similar issue? Thanks.

 This is the configuration I use.
 spark.rdd.compress true
 spark.shuffle.consolidateFiles true
 spark.shuffle.manager SORT
 spark.akka.frameSize 128
 spark.akka.timeout  600
 spark.core.connection.ack.wait.timeout  600
 spark.core.connection.auth.wait.timeout 300

 java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
 
 java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
 java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
 java.io.ObjectInputStream.init(ObjectInputStream.java:299)
 
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.init(JavaSerializer.scala:57)
 
 org.apache.spark.serializer.JavaDeserializationStream.init(JavaSerializer.scala:57)
 
 org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:95)
 
 org.apache.spark.storage.BlockManager.getLocalShuffleFromDisk(BlockManager.scala:351)
 
 org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196)
 
 org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196)
 
 org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243)
 
 org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52)
 scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 
 org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
 
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:89)
 
 org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44)
 org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 org.apache.spark.scheduler.Task.run(Task.scala:56)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)


 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Shuffle issues in the current master

2014-10-22 Thread DB Tsai
Or can it be solved by setting both of the following setting into true for now?

spark.shuffle.spill.compress true
spark.shuffle.compress ture

Sincerely,

DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Wed, Oct 22, 2014 at 1:34 PM, DB Tsai dbt...@dbtsai.com wrote:
 It seems that this issue should be addressed by
 https://github.com/apache/spark/pull/2890 ? Am I right?

 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Wed, Oct 22, 2014 at 11:54 AM, DB Tsai dbt...@dbtsai.com wrote:
 Hi all,

 With SPARK-3948, the exception in Snappy PARSING_ERROR is gone, but
 I've another exception now. I've no clue about what's going on; does
 anyone run into similar issue? Thanks.

 This is the configuration I use.
 spark.rdd.compress true
 spark.shuffle.consolidateFiles true
 spark.shuffle.manager SORT
 spark.akka.frameSize 128
 spark.akka.timeout  600
 spark.core.connection.ack.wait.timeout  600
 spark.core.connection.auth.wait.timeout 300

 java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
 
 java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
 
 java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
 java.io.ObjectInputStream.init(ObjectInputStream.java:299)
 
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.init(JavaSerializer.scala:57)
 
 org.apache.spark.serializer.JavaDeserializationStream.init(JavaSerializer.scala:57)
 
 org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:95)
 
 org.apache.spark.storage.BlockManager.getLocalShuffleFromDisk(BlockManager.scala:351)
 
 org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196)
 
 org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196)
 
 org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243)
 
 org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52)
 scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 
 org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
 
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 
 org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:89)
 
 org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44)
 org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 org.apache.spark.scheduler.Task.run(Task.scala:56)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)


 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Shuffle issues in the current master

2014-10-22 Thread DB Tsai
PS, sorry for spamming the mailing list. Based my knowledge, both
spark.shuffle.spill.compress and spark.shuffle.compress are default to
true, so in theory, we should not run into this issue if we don't
change any setting. Is there any other big we run into?

Thanks.

Sincerely,

DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Wed, Oct 22, 2014 at 1:37 PM, DB Tsai dbt...@dbtsai.com wrote:
 Or can it be solved by setting both of the following setting into true for 
 now?

 spark.shuffle.spill.compress true
 spark.shuffle.compress ture

 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Wed, Oct 22, 2014 at 1:34 PM, DB Tsai dbt...@dbtsai.com wrote:
 It seems that this issue should be addressed by
 https://github.com/apache/spark/pull/2890 ? Am I right?

 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Wed, Oct 22, 2014 at 11:54 AM, DB Tsai dbt...@dbtsai.com wrote:
 Hi all,

 With SPARK-3948, the exception in Snappy PARSING_ERROR is gone, but
 I've another exception now. I've no clue about what's going on; does
 anyone run into similar issue? Thanks.

 This is the configuration I use.
 spark.rdd.compress true
 spark.shuffle.consolidateFiles true
 spark.shuffle.manager SORT
 spark.akka.frameSize 128
 spark.akka.timeout  600
 spark.core.connection.ack.wait.timeout  600
 spark.core.connection.auth.wait.timeout 300

 java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
 
 java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
 
 java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
 java.io.ObjectInputStream.init(ObjectInputStream.java:299)
 
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.init(JavaSerializer.scala:57)
 
 org.apache.spark.serializer.JavaDeserializationStream.init(JavaSerializer.scala:57)
 
 org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:95)
 
 org.apache.spark.storage.BlockManager.getLocalShuffleFromDisk(BlockManager.scala:351)
 
 org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196)
 
 org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196)
 
 org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243)
 
 org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52)
 scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 
 org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
 
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 
 org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:89)
 
 org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44)
 org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 org.apache.spark.scheduler.Task.run(Task.scala:56)
 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)


 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Shuffle issues in the current master

2014-10-22 Thread Aaron Davidson
You may be running into this issue:
https://issues.apache.org/jira/browse/SPARK-4019

You could check by having 2000 or fewer reduce partitions.

On Wed, Oct 22, 2014 at 1:48 PM, DB Tsai dbt...@dbtsai.com wrote:

 PS, sorry for spamming the mailing list. Based my knowledge, both
 spark.shuffle.spill.compress and spark.shuffle.compress are default to
 true, so in theory, we should not run into this issue if we don't
 change any setting. Is there any other big we run into?

 Thanks.

 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Wed, Oct 22, 2014 at 1:37 PM, DB Tsai dbt...@dbtsai.com wrote:
  Or can it be solved by setting both of the following setting into true
 for now?
 
  spark.shuffle.spill.compress true
  spark.shuffle.compress ture
 
  Sincerely,
 
  DB Tsai
  ---
  My Blog: https://www.dbtsai.com
  LinkedIn: https://www.linkedin.com/in/dbtsai
 
 
  On Wed, Oct 22, 2014 at 1:34 PM, DB Tsai dbt...@dbtsai.com wrote:
  It seems that this issue should be addressed by
  https://github.com/apache/spark/pull/2890 ? Am I right?
 
  Sincerely,
 
  DB Tsai
  ---
  My Blog: https://www.dbtsai.com
  LinkedIn: https://www.linkedin.com/in/dbtsai
 
 
  On Wed, Oct 22, 2014 at 11:54 AM, DB Tsai dbt...@dbtsai.com wrote:
  Hi all,
 
  With SPARK-3948, the exception in Snappy PARSING_ERROR is gone, but
  I've another exception now. I've no clue about what's going on; does
  anyone run into similar issue? Thanks.
 
  This is the configuration I use.
  spark.rdd.compress true
  spark.shuffle.consolidateFiles true
  spark.shuffle.manager SORT
  spark.akka.frameSize 128
  spark.akka.timeout  600
  spark.core.connection.ack.wait.timeout  600
  spark.core.connection.auth.wait.timeout 300
 
 
 java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
 
  
 java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
 
  java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
  java.io.ObjectInputStream.init(ObjectInputStream.java:299)
 
  
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.init(JavaSerializer.scala:57)
 
  
 org.apache.spark.serializer.JavaDeserializationStream.init(JavaSerializer.scala:57)
 
  
 org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:95)
 
  
 org.apache.spark.storage.BlockManager.getLocalShuffleFromDisk(BlockManager.scala:351)
 
  
 org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196)
 
  
 org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196)
 
  
 org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243)
 
  
 org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52)
  scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 
  org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
 
  
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 
  org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:89)
 
  
 org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44)
  org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
  org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
  org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
  org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
  org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
  org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 
  org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 
  org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
  org.apache.spark.scheduler.Task.run(Task.scala:56)
 
  org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
 
  
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
  
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  java.lang.Thread.run(Thread.java:744)
 
 
  Sincerely,
 
  DB Tsai
  ---
  My Blog: https://www.dbtsai.com
  LinkedIn: https://www.linkedin.com/in/dbtsai

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: