Dataset/DataFrame has repartition (which can be used to partition by key)
and sortWithinPartitions.

see for example usage here:
https://github.com/tresata/spark-sorted/blob/master/src/main/scala/com/tresata/spark/sorted/sql/GroupSortedDataset.scala#L18

On Fri, Jun 23, 2017 at 5:43 PM, Keith Chapman <keithgchap...@gmail.com>
wrote:

> Hi,
>
> I have code that does the following using RDDs,
>
> val outputPartitionCount = 300
> val part = new MyOwnPartitioner(outputPartitionCount)
> val finalRdd = myRdd.repartitionAndSortWithinPartitions(part)
>
> where myRdd is correctly formed as key, value pairs. I am looking convert
> this to use Dataset/Dataframe instead of RDDs, so my question is:
>
> Is there custom partitioning of Dataset/Dataframe implemented in Spark?
> Can I accomplish the partial sort using mapPartitions on the resulting
> partitioned Dataset/Dataframe?
>
> Any thoughts?
>
> Regards,
> Keith.
>
> http://keith-chapman.com
>

Reply via email to