You have to repartition/coalesce *after *the action that is causing the
shuffle as that one will take the value you've set

On Tue, Oct 17, 2017 at 8:40 PM KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:

> Yes still I see more number of part files and exactly the number I have
> defined did spark.sql.shuffle.partitions
>
> Sent from my iPhone
>
> On Oct 17, 2017, at 2:32 PM, Michael Artz <michaelea...@gmail.com> wrote:
>
> Have you tried caching it and using a coalesce?
>
>
>
> On Oct 17, 2017 1:47 PM, "KhajaAsmath Mohammed" <mdkhajaasm...@gmail.com>
> wrote:
>
>> I tried repartitions but spark.sql.shuffle.partitions is taking up
>> precedence over repartitions or coalesce. how to get the lesser number of
>> files with same performance?
>>
>> On Fri, Oct 13, 2017 at 3:45 AM, Tushar Adeshara <
>> tushar_adesh...@persistent.com> wrote:
>>
>>> You can also try coalesce as it will avoid full shuffle.
>>>
>>>
>>> Regards,
>>>
>>> *Tushar Adeshara*
>>>
>>> *Technical Specialist – Analytics Practice*
>>>
>>> *Cell: +91-81490 04192 <+91%2081490%2004192>*
>>>
>>> *Persistent Systems** Ltd. **| **Partners in Innovation **|* 
>>> *www.persistentsys.com
>>> <http://www.persistentsys.com/>*
>>>
>>>
>>> ------------------------------
>>> *From:* KhajaAsmath Mohammed <mdkhajaasm...@gmail.com>
>>> *Sent:* 13 October 2017 09:35
>>> *To:* user @spark
>>> *Subject:* Spark - Partitions
>>>
>>> Hi,
>>>
>>> I am reading hive query and wiriting the data back into hive after doing
>>> some transformations.
>>>
>>> I have changed setting spark.sql.shuffle.partitions to 2000 and since
>>> then job completes fast but the main problem is I am getting 2000 files for
>>> each partition
>>> size of file is 10 MB .
>>>
>>> is there a way to get same performance but write lesser number of files ?
>>>
>>> I am trying repartition now but would like to know if there are any
>>> other options.
>>>
>>> Thanks,
>>> Asmath
>>> DISCLAIMER
>>> ==========
>>> This e-mail may contain privileged and confidential information which is
>>> the property of Persistent Systems Ltd. It is intended only for the use of
>>> the individual or entity to which it is addressed. If you are not the
>>> intended recipient, you are not authorized to read, retain, copy, print,
>>> distribute or use this message. If you have received this communication in
>>> error, please notify the sender and delete all copies of this message.
>>> Persistent Systems Ltd. does not accept any liability for virus infected
>>> mails.
>>>
>>
>>

Reply via email to