Dataset/DataFrame has repartition (which can be used to partition by key) and sortWithinPartitions.
see for example usage here: https://github.com/tresata/spark-sorted/blob/master/src/main/scala/com/tresata/spark/sorted/sql/GroupSortedDataset.scala#L18 On Fri, Jun 23, 2017 at 5:43 PM, Keith Chapman <keithgchap...@gmail.com> wrote: > Hi, > > I have code that does the following using RDDs, > > val outputPartitionCount = 300 > val part = new MyOwnPartitioner(outputPartitionCount) > val finalRdd = myRdd.repartitionAndSortWithinPartitions(part) > > where myRdd is correctly formed as key, value pairs. I am looking convert > this to use Dataset/Dataframe instead of RDDs, so my question is: > > Is there custom partitioning of Dataset/Dataframe implemented in Spark? > Can I accomplish the partial sort using mapPartitions on the resulting > partitioned Dataset/Dataframe? > > Any thoughts? > > Regards, > Keith. > > http://keith-chapman.com >