Re: Shuffle issues in the current master

2014-10-25 Thread DB Tsai
Hi Andrew, We were running the master after SPARK-3613. Will give another shot against the current master while Josh fixed couple issues in shuffle. Thanks. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn:

Re: Shuffle issues in the current master

2014-10-23 Thread Andrew Or
To add to Aaron's response, `spark.shuffle.consolidateFiles` only applies to hash-based shuffle, so you shouldn't have to set it for sort-based shuffle. And yes, since you changed neither `spark.shuffle.compress` nor `spark.shuffle.spill.compress` you can't possibly have run into what #2890 fixes.

Shuffle issues in the current master

2014-10-22 Thread DB Tsai
Hi all, With SPARK-3948, the exception in Snappy PARSING_ERROR is gone, but I've another exception now. I've no clue about what's going on; does anyone run into similar issue? Thanks. This is the configuration I use. spark.rdd.compress true spark.shuffle.consolidateFiles true

Re: Shuffle issues in the current master

2014-10-22 Thread DB Tsai
It seems that this issue should be addressed by https://github.com/apache/spark/pull/2890 ? Am I right? Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Wed, Oct 22, 2014 at 11:54 AM, DB

Re: Shuffle issues in the current master

2014-10-22 Thread DB Tsai
Or can it be solved by setting both of the following setting into true for now? spark.shuffle.spill.compress true spark.shuffle.compress ture Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai

Re: Shuffle issues in the current master

2014-10-22 Thread DB Tsai
PS, sorry for spamming the mailing list. Based my knowledge, both spark.shuffle.spill.compress and spark.shuffle.compress are default to true, so in theory, we should not run into this issue if we don't change any setting. Is there any other big we run into? Thanks. Sincerely, DB Tsai

Re: Shuffle issues in the current master

2014-10-22 Thread Aaron Davidson
You may be running into this issue: https://issues.apache.org/jira/browse/SPARK-4019 You could check by having 2000 or fewer reduce partitions. On Wed, Oct 22, 2014 at 1:48 PM, DB Tsai dbt...@dbtsai.com wrote: PS, sorry for spamming the mailing list. Based my knowledge, both