Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-21 Thread Shivam Sharma
hen run simple spark job to read and partition based on 'n'. >>>> >>>> Hichame >>>> >>>> *From:* felixcheun...@hotmail.com >>>> *Sent:* January 19, 2019 2:06 PM >>>> *To:* 28shivamsha...@gmail.com; user@spark.apache.org >>

Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-21 Thread Arnaud LARROQUE
your dataset)/hdfs block >>> size >>> Then run simple spark job to read and partition based on 'n'. >>> >>> Hichame >>> >>> *From:* felixcheun...@hotmail.com >>> *Sent:* January 19, 2019 2:06 PM >>> *To:* 28shivamsha...@gmail.com;

Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-21 Thread Shivam Sharma
b to read and partition based on 'n'. >> >> Hichame >> >> *From:* felixcheun...@hotmail.com >> *Sent:* January 19, 2019 2:06 PM >> *To:* 28shivamsha...@gmail.com; user@spark.apache.org >> *Subject:* Re: Persist Dataframe to HDFS considering HDFS Block Size. >

Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-19 Thread Hichame El Khalfi
To: 28shivamsha...@gmail.com; user@spark.apache.org Subject: Re: Persist Dataframe to HDFS considering HDFS Block Size. You can call coalesce to combine partitions.. From: Shivam Sharma <28shivamsha...@gmail.com> Sent: Saturday, January 19, 2019 7:43 AM To

Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-19 Thread Felix Cheung
You can call coalesce to combine partitions.. From: Shivam Sharma <28shivamsha...@gmail.com> Sent: Saturday, January 19, 2019 7:43 AM To: user@spark.apache.org Subject: Persist Dataframe to HDFS considering HDFS Block Size. Hi All, I wanted to persist dat

Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-19 Thread Shivam Sharma
Hi All, I wanted to persist dataframe on HDFS. Basically, I am inserting data into a HIVE table using Spark. Currently, at the time of writing to HIVE table I have set total shuffle partitions = 400 so total 400 files are being created which is not even considering HDFS block size. How can I tell