RE: Union and reduceByKey will trigger shuffle even same partition?

2015-02-24 Thread Shuai Zheng
, 2015 6:00 PM To: Shuai Zheng Cc: Shao, Saisai; user@spark.apache.org Subject: Re: Union and reduceByKey will trigger shuffle even same partition? I think you're getting tripped up lazy evaluation and the way stage boundaries work (admittedly its pretty confusing in this case). It is true

RE: Union and reduceByKey will trigger shuffle even same partition?

2015-02-23 Thread Shuai Zheng
will trigger shuffle even same partition? If you call reduceByKey(), internally Spark will introduce a shuffle operations, not matter the data is already partitioned locally, Spark itself do not know the data is already well partitioned. So if you want to avoid Shuffle, you have to write

RE: Union and reduceByKey will trigger shuffle even same partition?

2015-02-23 Thread Shao, Saisai
If you call reduceByKey(), internally Spark will introduce a shuffle operations, not matter the data is already partitioned locally, Spark itself do not know the data is already well partitioned. So if you want to avoid Shuffle, you have to write the code explicitly to avoid this, from my

RE: Union and reduceByKey will trigger shuffle even same partition?

2015-02-23 Thread Shuai Zheng
: Monday, February 23, 2015 3:13 PM To: Shuai Zheng Cc: user@spark.apache.org Subject: RE: Union and reduceByKey will trigger shuffle even same partition? If you call reduceByKey(), internally Spark will introduce a shuffle operations, not matter the data is already partitioned locally, Spark

RE: Union and reduceByKey will trigger shuffle even same partition?

2015-02-23 Thread Shao, Saisai
@spark.apache.orgmailto:user@spark.apache.org Subject: RE: Union and reduceByKey will trigger shuffle even same partition? If you call reduceByKey(), internally Spark will introduce a shuffle operations, not matter the data is already partitioned locally, Spark itself do not know the data is already well partitioned. So

RE: Union and reduceByKey will trigger shuffle even same partition?

2015-02-23 Thread Shao, Saisai
: user@spark.apache.org Subject: RE: Union and reduceByKey will trigger shuffle even same partition? In the book of learning spark: [cid:image002.jpg@01D04F74.28C9F870] So here it means only no shuffle happen crossing network but still will do shuffle locally? Even it is the case, why union

Re: Union and reduceByKey will trigger shuffle even same partition?

2015-02-23 Thread Imran Rashid
will trigger shuffle even same partition? If you call reduceByKey(), internally Spark will introduce a shuffle operations, not matter the data is already partitioned locally, Spark itself do not know the data is already well partitioned. So if you want to avoid Shuffle, you have to write