I had that issue too and from what I gathered, it is an expected optimization... Try using repartiion instead
Get BlueMail for Android On Feb 3, 2021, 11:55, at 11:55, James Yu <ja...@ispot.tv> wrote: >Hi Team, > >We are running into this poor performance issue and seeking your >suggestion on how to improve it: > >We have a particular dataset which we aggregate from other datasets and >like to write out to one single file (because it is small enough). We >found that after a series of transformations (GROUP BYs, FLATMAPs), we >coalesced the final RDD to 1 partition before writing it out, and this >coalesce degrade the performance, not that this additional coalesce >operation took additional runtime, but it somehow dictates the >partitions to use in the upstream transformations. > >We hope there is a simple and useful way to solve this kind of issue >which we believe is quite common for many people. > > >Thanks > >James