Re: Re: repartitionAndSortWithinPartitions task shuffle phase is very slow

2015-10-30 Thread Luke Han
count of partitions? > >> > >> > >> > >> > >> > >> 250635...@qq.com > >> > >> From: Li Yang > >> Date: 2015-10-23 16:17 > >> To: dev > >> CC: Reynold Xin; dev@spark.apache.org > >> Subject:

Re: Re: repartitionAndSortWithinPartitions task shuffle phase is very slow

2015-10-26 Thread 周千昊
; >> >> 250635...@qq.com >> >> From: Li Yang >> Date: 2015-10-23 16:17 >> To: dev >> CC: Reynold Xin; dev@spark.apache.org >> Subject: Re: repartitionAndSortWithinPartitions task shuffle phase is >> very slow >> Any advise on how to tune th

Re: Re: repartitionAndSortWithinPartitions task shuffle phase is very slow

2015-10-23 Thread 周千昊
; To: dev > CC: Reynold Xin; dev@spark.apache.org > Subject: Re: repartitionAndSortWithinPartitions task shuffle phase is very > slow > Any advise on how to tune the repartitionAndSortWithinPartitions stage? > Any particular metrics or parameter to look into? Basically Spark and MR > shuffles the same amount of da

Re: repartitionAndSortWithinPartitions task shuffle phase is very slow

2015-10-23 Thread Li Yang
Any advise on how to tune the repartitionAndSortWithinPartitions stage? Any particular metrics or parameter to look into? Basically Spark and MR shuffles the same amount of data, cause we kinda copied MR implementation into Spark. Let us know if more info is needed. On Fri, Oct 23, 2015 at 10:24

Re: repartitionAndSortWithinPartitions task shuffle phase is very slow

2015-10-22 Thread Reynold Xin
Why do you do a glom? It seems unnecessarily expensive to materialize each partition in memory. On Thu, Oct 22, 2015 at 2:02 AM, 周千昊 wrote: > Hi, spark community > I have an application which I try to migrate from MR to Spark. > It will do some calculations from

Re: repartitionAndSortWithinPartitions task shuffle phase is very slow

2015-10-22 Thread 周千昊
+kylin dev list 周千昊 于2015年10月23日周五 上午10:20写道: > Hi, Reynold > Using glom() is because it is easy to adapt to calculation logic > already implemented in MR. And o be clear, we are still in POC. > Since the results shows there is almost no difference between this >

Re: repartitionAndSortWithinPartitions task shuffle phase is very slow

2015-10-22 Thread 周千昊
Hi, Reynold Using glom() is because it is easy to adapt to calculation logic already implemented in MR. And o be clear, we are still in POC. Since the results shows there is almost no difference between this glom stage and the MR mapper, using glom here might not be the issue. I