Re: Poor performance caused by coalesce to 1

2021-02-03 Thread Silvio Fiorito
om: Silvio Fiorito Sent: Wednesday, February 3, 2021 11:05 AM To: James Yu ; user Subject: Re: Poor performance caused by coalesce to 1 Coalesce is reducing the parallelization of your last stage, in your case to 1 task. So, it’s natural it will give poor performance especially with large da

Re: Poor performance caused by coalesce to 1

2021-02-03 Thread Gourav Sengupta
tage boundary"? > > Thanks > -- > *From:* Silvio Fiorito > *Sent:* Wednesday, February 3, 2021 11:05 AM > *To:* James Yu ; user > *Subject:* Re: Poor performance caused by coalesce to 1 > > > Coalesce is reducing the parallelization o

Re: Poor performance caused by coalesce to 1

2021-02-03 Thread Mich Talebzadeh
That sounds like a plan as suggested by Sean, I have also seen caching the RS before coalesce provides benefits, especially for a minute 50MB data. Check Spark GUI storage tab for its effect. HTH Mich LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Poor performance caused by coalesce to 1

2021-02-03 Thread James Yu
rito Sent: Wednesday, February 3, 2021 11:05 AM To: James Yu ; user Subject: Re: Poor performance caused by coalesce to 1 Coalesce is reducing the parallelization of your last stage, in your case to 1 task. So, it’s natural it will give poor performance especially with large data. If you absol

Re: Poor performance caused by coalesce to 1

2021-02-03 Thread Sean Owen
Probably could also be because that coalesce can cause some upstream transformations to also have parallelism of 1. I think (?) an OK solution is to cache the result, then coalesce and write. Or combine the files after the fact. or do what Silvio said. On Wed, Feb 3, 2021 at 12:55 PM James Yu

Re: Poor performance caused by coalesce to 1

2021-02-03 Thread Stéphane Verlet
I had that issue too and from what I gathered, it is an expected optimization... Try using repartiion instead ⁣Get BlueMail for Android ​ On Feb 3, 2021, 11:55, at 11:55, James Yu wrote: >Hi Team, > >We are running into this poor performance issue and seeking your >suggestion on how to improve

Re: Poor performance caused by coalesce to 1

2021-02-03 Thread Silvio Fiorito
Date: Wednesday, February 3, 2021 at 1:54 PM To: user Subject: Poor performance caused by coalesce to 1 Hi Team, We are running into this poor performance issue and seeking your suggestion on how to improve it: We have a particular dataset which we aggregate from other datasets and like to write

Poor performance caused by coalesce to 1

2021-02-03 Thread James Yu
Hi Team, We are running into this poor performance issue and seeking your suggestion on how to improve it: We have a particular dataset which we aggregate from other datasets and like to write out to one single file (because it is small enough). We found that after a series of