On Wed, Jan 31, 2018 at 1:15 AM, Ruifeng Zheng wrote:
> HI all:
>
>
>
>1, Dataset API supports operation “sortWithinPartitions”, but in RDD
> API there is no counterpart (I know there is
> “repartitionAndSortWithinPartitions”, but I don’t want to repartition the
> RDD), I have to convert R
Do you mean in-memory processing? It works fine if all partitions are small.
But when some partition don’t fit in memory, it will cause OOM.
发件人: Reynold Xin
日期: 2018年2月1日 星期四 下午3:14
收件人: Ruifeng Zheng
抄送:
主题: Re: [Core][Suggestion] sortWithinPartitions and aggregateWithinPartitions
You can just do that with mapPartitions pretty easily can’t you?
On Wed, Jan 31, 2018 at 11:08 PM Ruifeng Zheng wrote:
> HI all:
>
>
>
>1, Dataset API supports operation “sortWithinPartitions”, but in
> RDD API there is no counterpart (I know there is
> “repartitionAndSortWithinPartition
HI all:
1, Dataset API supports operation “sortWithinPartitions”, but in RDD API
there is no counterpart (I know there is “repartitionAndSortWithinPartitions”,
but I don’t want to repartition the RDD), I have to convert RDD to Dataset for
this function. Would it make sense to add a “s