Re: Shuffle write increases in spark 1.2

2015-02-15 Thread Aaron Davidson
I think Xuefeng Wu's suggestion is likely correct. This different is more likely explained by the compression library changing versions than sort vs hash shuffle (which should not affect output size significantly). Others have reported that switching to lz4 fixed their issue. We should document

Re: Shuffle write increases in spark 1.2

2015-02-15 Thread Ami Khandeshi
:14 (GMT+09:00) *Title* : Re: Shuffle write increases in spark 1.2 If you have a small reproduction for this issue, can you open a ticket at https://issues.apache.org/jira/browse/SPARK ? On December 29, 2014 at 7:10:02 PM, Kevin Jung (itsjb.j...@samsung.com) wrote: Hi all, The size

Re: Shuffle write increases in spark 1.2

2015-02-14 Thread Peng Cheng
I double check the 1.2 feature list and found out that the new sort-based shuffle manager has nothing to do with HashPartitioner :- Sorry for the misinformation. In another hand. This may explain increase in shuffle spill as a side effect of the new shuffle manager, let me revert

Re: Shuffle write increases in spark 1.2

2015-02-14 Thread Peng Cheng
Same problem here, shuffle write increased from 10G to over 64G, since I'm running on amazon EC2 this always cause temporary folder to consume all the disk space. Still looking for a solution. BTW, the 64G shuffle write is encountered on shuffling a pairRDD with HashPartitioner, so its not

Re: Shuffle write increases in spark 1.2

2015-02-10 Thread chris
Hello, as the original message never got accepted to the mailinglist, I quote it here completely: Kevin Jung wrote Hi all, The size of shuffle write showing in spark web UI is much different when I execute same spark job on same input data(100GB) in both spark 1.1 and spark 1.2. At the

Re: Shuffle write increases in spark 1.2

2015-02-10 Thread chris
Hello, as the original message from Kevin Jung never got accepted to the mailinglist, I quote it here completely: Kevin Jung wrote Hi all, The size of shuffle write showing in spark web UI is much different when I execute same spark job on same input data(100GB) in both spark 1.1 and spark

Re: Shuffle write increases in spark 1.2

2015-02-05 Thread Anubhav Srivastav
/browse/SPARK-5081 --- *Original Message* --- *Sender* : Josh Rosenrosenvi...@gmail.com *Date* : 2015-01-05 06:14 (GMT+09:00) *Title* : Re: Shuffle write increases in spark 1.2 If you have a small reproduction for this issue, can you open a ticket at https://issues.apache.org/jira

Re: Shuffle write increases in spark 1.2

2015-01-04 Thread 정재부
Sure, here is a ticket. https://issues.apache.org/jira/browse/SPARK-5081 --- Original Message --- Sender : Josh Rosenrosenvi...@gmail.com Date : 2015-01-05 06:14 (GMT+09:00) Title : Re: Shuffle write increases in spark 1.2 If you have a small reproduction for this issue

Shuffle write increases in spark 1.2

2014-12-29 Thread Kevin Jung
Hi all, The size of shuffle write showing in spark web UI is mush different when I execute same spark job on same input data(100GB) in both spark 1.1 and spark 1.2. At the same sortBy stage, the size of shuffle write is 39.7GB in spark 1.1 but 91.0GB in spark 1.2. I set spark.shuffle.manager