I tried repartitions but spark.sql.shuffle.partitions is taking up
precedence over repartitions or coalesce. how to get the lesser number of
files with same performance?

On Fri, Oct 13, 2017 at 3:45 AM, Tushar Adeshara <
tushar_adesh...@persistent.com> wrote:

> You can also try coalesce as it will avoid full shuffle.
>
>
> Regards,
>
> *Tushar Adeshara*
>
> *Technical Specialist – Analytics Practice*
>
> *Cell: +91-81490 04192 <+91%2081490%2004192>*
>
> *Persistent Systems** Ltd. **| **Partners in Innovation **|* 
> *www.persistentsys.com
> <http://www.persistentsys.com/>*
>
>
> ------------------------------
> *From:* KhajaAsmath Mohammed <mdkhajaasm...@gmail.com>
> *Sent:* 13 October 2017 09:35
> *To:* user @spark
> *Subject:* Spark - Partitions
>
> Hi,
>
> I am reading hive query and wiriting the data back into hive after doing
> some transformations.
>
> I have changed setting spark.sql.shuffle.partitions to 2000 and since then
> job completes fast but the main problem is I am getting 2000 files for each
> partition
> size of file is 10 MB .
>
> is there a way to get same performance but write lesser number of files ?
>
> I am trying repartition now but would like to know if there are any other
> options.
>
> Thanks,
> Asmath
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
>

Reply via email to