Re: [Core][Suggestion] sortWithinPartitions and aggregateWithinPartitions for RDD

2018-02-01 Thread Mridul Muralidharan
On Wed, Jan 31, 2018 at 1:15 AM, Ruifeng Zheng wrote: > HI all: > > > >1, Dataset API supports operation “sortWithinPartitions”, but in RDD > API there is no counterpart (I know there is > “repartitionAndSortWithinPartitions”, but I don’t want to repartition the > RDD), I have to convert R

Re: [Core][Suggestion] sortWithinPartitions and aggregateWithinPartitions for RDD

2018-01-31 Thread Ruifeng Zheng
Do you mean in-memory processing? It works fine if all partitions are small. But when some partition don’t fit in memory, it will cause OOM. 发件人: Reynold Xin 日期: 2018年2月1日 星期四 下午3:14 收件人: Ruifeng Zheng 抄送: 主题: Re: [Core][Suggestion] sortWithinPartitions and aggregateWithinPartitions

Re: [Core][Suggestion] sortWithinPartitions and aggregateWithinPartitions for RDD

2018-01-31 Thread Reynold Xin
You can just do that with mapPartitions pretty easily can’t you? On Wed, Jan 31, 2018 at 11:08 PM Ruifeng Zheng wrote: > HI all: > > > >1, Dataset API supports operation “sortWithinPartitions”, but in > RDD API there is no counterpart (I know there is > “repartitionAndSortWithinPartition

[Core][Suggestion] sortWithinPartitions and aggregateWithinPartitions for RDD

2018-01-31 Thread Ruifeng Zheng
HI all: 1, Dataset API supports operation “sortWithinPartitions”, but in RDD API there is no counterpart (I know there is “repartitionAndSortWithinPartitions”, but I don’t want to repartition the RDD), I have to convert RDD to Dataset for this function. Would it make sense to add a “s