PS, sorry for spamming the mailing list. Based my knowledge, both spark.shuffle.spill.compress and spark.shuffle.compress are default to true, so in theory, we should not run into this issue if we don't change any setting. Is there any other big we run into?
Thanks. Sincerely, DB Tsai ------------------------------------------------------- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Wed, Oct 22, 2014 at 1:37 PM, DB Tsai <dbt...@dbtsai.com> wrote: > Or can it be solved by setting both of the following setting into true for > now? > > spark.shuffle.spill.compress true > spark.shuffle.compress ture > > Sincerely, > > DB Tsai > ------------------------------------------------------- > My Blog: https://www.dbtsai.com > LinkedIn: https://www.linkedin.com/in/dbtsai > > > On Wed, Oct 22, 2014 at 1:34 PM, DB Tsai <dbt...@dbtsai.com> wrote: >> It seems that this issue should be addressed by >> https://github.com/apache/spark/pull/2890 ? Am I right? >> >> Sincerely, >> >> DB Tsai >> ------------------------------------------------------- >> My Blog: https://www.dbtsai.com >> LinkedIn: https://www.linkedin.com/in/dbtsai >> >> >> On Wed, Oct 22, 2014 at 11:54 AM, DB Tsai <dbt...@dbtsai.com> wrote: >>> Hi all, >>> >>> With SPARK-3948, the exception in Snappy PARSING_ERROR is gone, but >>> I've another exception now. I've no clue about what's going on; does >>> anyone run into similar issue? Thanks. >>> >>> This is the configuration I use. >>> spark.rdd.compress true >>> spark.shuffle.consolidateFiles true >>> spark.shuffle.manager SORT >>> spark.akka.frameSize 128 >>> spark.akka.timeout 600 >>> spark.core.connection.ack.wait.timeout 600 >>> spark.core.connection.auth.wait.timeout 300 >>> >>> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325) >>> >>> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794) >>> >>> java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801) >>> java.io.ObjectInputStream.<init>(ObjectInputStream.java:299) >>> >>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.<init>(JavaSerializer.scala:57) >>> >>> org.apache.spark.serializer.JavaDeserializationStream.<init>(JavaSerializer.scala:57) >>> >>> org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:95) >>> >>> org.apache.spark.storage.BlockManager.getLocalShuffleFromDisk(BlockManager.scala:351) >>> >>> org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196) >>> >>> org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196) >>> >>> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243) >>> >>> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52) >>> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) >>> >>> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) >>> >>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) >>> >>> org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:89) >>> >>> org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44) >>> org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92) >>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >>> org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >>> org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) >>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >>> org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >>> >>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >>> org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >>> >>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) >>> >>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >>> org.apache.spark.scheduler.Task.run(Task.scala:56) >>> >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) >>> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> java.lang.Thread.run(Thread.java:744) >>> >>> >>> Sincerely, >>> >>> DB Tsai >>> ------------------------------------------------------- >>> My Blog: https://www.dbtsai.com >>> LinkedIn: https://www.linkedin.com/in/dbtsai --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org