Yes still I see more number of part files and exactly the number I have defined did spark.sql.shuffle.partitions
Sent from my iPhone > On Oct 17, 2017, at 2:32 PM, Michael Artz <michaelea...@gmail.com> wrote: > > Have you tried caching it and using a coalesce? > > > >> On Oct 17, 2017 1:47 PM, "KhajaAsmath Mohammed" <mdkhajaasm...@gmail.com> >> wrote: >> I tried repartitions but spark.sql.shuffle.partitions is taking up >> precedence over repartitions or coalesce. how to get the lesser number of >> files with same performance? >> >>> On Fri, Oct 13, 2017 at 3:45 AM, Tushar Adeshara >>> <tushar_adesh...@persistent.com> wrote: >>> You can also try coalesce as it will avoid full shuffle. >>> >>> >>> Regards, >>> Tushar Adeshara >>> >>> Technical Specialist – Analytics Practice >>> >>> Cell: +91-81490 04192 >>> >>> Persistent Systems Ltd. | Partners in Innovation | www.persistentsys.com >>> >>> >>> From: KhajaAsmath Mohammed <mdkhajaasm...@gmail.com> >>> Sent: 13 October 2017 09:35 >>> To: user @spark >>> Subject: Spark - Partitions >>> >>> Hi, >>> >>> I am reading hive query and wiriting the data back into hive after doing >>> some transformations. >>> >>> I have changed setting spark.sql.shuffle.partitions to 2000 and since then >>> job completes fast but the main problem is I am getting 2000 files for each >>> partition >>> size of file is 10 MB . >>> >>> is there a way to get same performance but write lesser number of files ? >>> >>> I am trying repartition now but would like to know if there are any other >>> options. >>> >>> Thanks, >>> Asmath >>> DISCLAIMER >>> ========== >>> This e-mail may contain privileged and confidential information which is >>> the property of Persistent Systems Ltd. It is intended only for the use of >>> the individual or entity to which it is addressed. If you are not the >>> intended recipient, you are not authorized to read, retain, copy, print, >>> distribute or use this message. If you have received this communication in >>> error, please notify the sender and delete all copies of this message. >>> Persistent Systems Ltd. does not accept any liability for virus infected >>> mails. >>