count of partitions?
> >>
> >>
> >>
> >>
> >>
> >> 250635...@qq.com
> >>
> >> From: Li Yang
> >> Date: 2015-10-23 16:17
> >> To: dev
> >> CC: Reynold Xin; dev@spark.apache.org
> >> Subject:
;
>>
>> 250635...@qq.com
>>
>> From: Li Yang
>> Date: 2015-10-23 16:17
>> To: dev
>> CC: Reynold Xin; dev@spark.apache.org
>> Subject: Re: repartitionAndSortWithinPartitions task shuffle phase is
>> very slow
>> Any advise on how to tune th
; To: dev
> CC: Reynold Xin; dev@spark.apache.org
> Subject: Re: repartitionAndSortWithinPartitions task shuffle phase is very
> slow
> Any advise on how to tune the repartitionAndSortWithinPartitions stage?
> Any particular metrics or parameter to look into? Basically Spark and MR
> shuffles the same amount of da
Any advise on how to tune the repartitionAndSortWithinPartitions stage?
Any particular metrics or parameter to look into? Basically Spark and MR
shuffles the same amount of data, cause we kinda copied MR implementation
into Spark.
Let us know if more info is needed.
On Fri, Oct 23, 2015 at 10:24
Why do you do a glom? It seems unnecessarily expensive to materialize each
partition in memory.
On Thu, Oct 22, 2015 at 2:02 AM, 周千昊 wrote:
> Hi, spark community
> I have an application which I try to migrate from MR to Spark.
> It will do some calculations from
+kylin dev list
周千昊 于2015年10月23日周五 上午10:20写道:
> Hi, Reynold
> Using glom() is because it is easy to adapt to calculation logic
> already implemented in MR. And o be clear, we are still in POC.
> Since the results shows there is almost no difference between this
>
Hi, Reynold
Using glom() is because it is easy to adapt to calculation logic
already implemented in MR. And o be clear, we are still in POC.
Since the results shows there is almost no difference between this
glom stage and the MR mapper, using glom here might not be the issue.
I