You have to repartition/coalesce *after *the action that is causing the shuffle as that one will take the value you've set
On Tue, Oct 17, 2017 at 8:40 PM KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > Yes still I see more number of part files and exactly the number I have > defined did spark.sql.shuffle.partitions > > Sent from my iPhone > > On Oct 17, 2017, at 2:32 PM, Michael Artz <michaelea...@gmail.com> wrote: > > Have you tried caching it and using a coalesce? > > > > On Oct 17, 2017 1:47 PM, "KhajaAsmath Mohammed" <mdkhajaasm...@gmail.com> > wrote: > >> I tried repartitions but spark.sql.shuffle.partitions is taking up >> precedence over repartitions or coalesce. how to get the lesser number of >> files with same performance? >> >> On Fri, Oct 13, 2017 at 3:45 AM, Tushar Adeshara < >> tushar_adesh...@persistent.com> wrote: >> >>> You can also try coalesce as it will avoid full shuffle. >>> >>> >>> Regards, >>> >>> *Tushar Adeshara* >>> >>> *Technical Specialist – Analytics Practice* >>> >>> *Cell: +91-81490 04192 <+91%2081490%2004192>* >>> >>> *Persistent Systems** Ltd. **| **Partners in Innovation **|* >>> *www.persistentsys.com >>> <http://www.persistentsys.com/>* >>> >>> >>> ------------------------------ >>> *From:* KhajaAsmath Mohammed <mdkhajaasm...@gmail.com> >>> *Sent:* 13 October 2017 09:35 >>> *To:* user @spark >>> *Subject:* Spark - Partitions >>> >>> Hi, >>> >>> I am reading hive query and wiriting the data back into hive after doing >>> some transformations. >>> >>> I have changed setting spark.sql.shuffle.partitions to 2000 and since >>> then job completes fast but the main problem is I am getting 2000 files for >>> each partition >>> size of file is 10 MB . >>> >>> is there a way to get same performance but write lesser number of files ? >>> >>> I am trying repartition now but would like to know if there are any >>> other options. >>> >>> Thanks, >>> Asmath >>> DISCLAIMER >>> ========== >>> This e-mail may contain privileged and confidential information which is >>> the property of Persistent Systems Ltd. It is intended only for the use of >>> the individual or entity to which it is addressed. If you are not the >>> intended recipient, you are not authorized to read, retain, copy, print, >>> distribute or use this message. If you have received this communication in >>> error, please notify the sender and delete all copies of this message. >>> Persistent Systems Ltd. does not accept any liability for virus infected >>> mails. >>> >> >>