Re: [pyspark 2.4+] BucketBy SortBy doesn't retain sort order

Rishi Shah Tue, 03 Mar 2020 18:23:24 -0800

Hi All,

Just checking in to see if anyone has any advice on this.


Thanks,
Rishi

On Mon, Mar 2, 2020 at 9:21 PM Rishi Shah <rishishah.s...@gmail.com> wrote:

> Hi All,
>
> I have 2 large tables (~1TB), I used the following to save both the
> tables. Then when I try to join both tables with join_column, it still does
> shuffle & sort before the join. Could someone please help?
>
> df.repartition(2000).write.bucketBy(1,
> join_column).sortBy(join_column).saveAsTable(tablename)
>
> --
> Regards,
>
> Rishi Shah
>


-- 
Regards,

Rishi Shah

Re: [pyspark 2.4+] BucketBy SortBy doesn't retain sort order

Reply via email to