HI all:
1, Dataset API supports operation “sortWithinPartitions”, but in RDD API there is no counterpart (I know there is “repartitionAndSortWithinPartitions”, but I don’t want to repartition the RDD), I have to convert RDD to Dataset for this function. Would it make sense to add a “sortWithinPartitions” for RDD? 2, In “aggregateByKey”/”reduceByKey”, I want to do some special operation (like aggregator compression) after local aggregation on each partitions. A similar case may be: compute ‘ApproximatePercentile’ for different keys by ”reduceByKey”, it may be helpful if ‘QuantileSummaries#compress’ is called before network communication. So I wonder if it is useful to add a ‘aggregateWithinPartitions’ for RDD? Regards, Ruifeng