I had that issue too and from what I gathered, it is an expected 
optimization... Try using repartiion instead

⁣Get BlueMail for Android ​

On Feb 3, 2021, 11:55, at 11:55, James Yu <ja...@ispot.tv> wrote:
>Hi Team,
>
>We are running into this poor performance issue and seeking your
>suggestion on how to improve it:
>
>We have a particular dataset which we aggregate from other datasets and
>like to write out to one single file (because it is small enough).  We
>found that after a series of transformations (GROUP BYs, FLATMAPs), we
>coalesced the final RDD to 1 partition before writing it out, and this
>coalesce degrade the performance, not that this additional coalesce
>operation took additional runtime, but it somehow dictates the
>partitions to use in the upstream transformations.
>
>We hope there is a simple and useful way to solve this kind of issue
>which we believe is quite common for many people.
>
>
>Thanks
>
>James

Reply via email to