Can you share some code? On Tue, 17 Oct 2017, 21:11 KhajaAsmath Mohammed, <mdkhajaasm...@gmail.com> wrote:
> In my case I am just writing the data frame back to hive. so when is the > best case to repartition it. I did repartition before calling insert > overwrite on table > > On Tue, Oct 17, 2017 at 3:07 PM, Sebastian Piu <sebastian....@gmail.com> > wrote: > >> You have to repartition/coalesce *after *the action that is causing the >> shuffle as that one will take the value you've set >> >> On Tue, Oct 17, 2017 at 8:40 PM KhajaAsmath Mohammed < >> mdkhajaasm...@gmail.com> wrote: >> >>> Yes still I see more number of part files and exactly the number I have >>> defined did spark.sql.shuffle.partitions >>> >>> Sent from my iPhone >>> >>> On Oct 17, 2017, at 2:32 PM, Michael Artz <michaelea...@gmail.com> >>> wrote: >>> >>> Have you tried caching it and using a coalesce? >>> >>> >>> >>> On Oct 17, 2017 1:47 PM, "KhajaAsmath Mohammed" <mdkhajaasm...@gmail.com> >>> wrote: >>> >>>> I tried repartitions but spark.sql.shuffle.partitions is taking up >>>> precedence over repartitions or coalesce. how to get the lesser number of >>>> files with same performance? >>>> >>>> On Fri, Oct 13, 2017 at 3:45 AM, Tushar Adeshara < >>>> tushar_adesh...@persistent.com> wrote: >>>> >>>>> You can also try coalesce as it will avoid full shuffle. >>>>> >>>>> >>>>> Regards, >>>>> >>>>> *Tushar Adeshara* >>>>> >>>>> *Technical Specialist – Analytics Practice* >>>>> >>>>> *Cell: +91-81490 04192 <+91%2081490%2004192>* >>>>> >>>>> *Persistent Systems** Ltd. **| **Partners in Innovation **|* >>>>> *www.persistentsys.com >>>>> <http://www.persistentsys.com/>* >>>>> >>>>> >>>>> ------------------------------ >>>>> *From:* KhajaAsmath Mohammed <mdkhajaasm...@gmail.com> >>>>> *Sent:* 13 October 2017 09:35 >>>>> *To:* user @spark >>>>> *Subject:* Spark - Partitions >>>>> >>>>> Hi, >>>>> >>>>> I am reading hive query and wiriting the data back into hive after >>>>> doing some transformations. >>>>> >>>>> I have changed setting spark.sql.shuffle.partitions to 2000 and since >>>>> then job completes fast but the main problem is I am getting 2000 files >>>>> for >>>>> each partition >>>>> size of file is 10 MB . >>>>> >>>>> is there a way to get same performance but write lesser number of >>>>> files ? >>>>> >>>>> I am trying repartition now but would like to know if there are any >>>>> other options. >>>>> >>>>> Thanks, >>>>> Asmath >>>>> DISCLAIMER >>>>> ========== >>>>> This e-mail may contain privileged and confidential information which >>>>> is the property of Persistent Systems Ltd. It is intended only for the use >>>>> of the individual or entity to which it is addressed. If you are not the >>>>> intended recipient, you are not authorized to read, retain, copy, print, >>>>> distribute or use this message. If you have received this communication in >>>>> error, please notify the sender and delete all copies of this message. >>>>> Persistent Systems Ltd. does not accept any liability for virus infected >>>>> mails. >>>>> >>>> >>>> >