Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-21 Thread Shivam Sharma
El Khalfi >>> wrote: >>> >>>> You can do this in 2 passes (not one) >>>> A) save you dataset into hdfs with what you have. >>>> B) calculate number of partition, n= (size of your dataset)/hdfs block >>>> size >>>> T

Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-21 Thread Arnaud LARROQUE
your dataset)/hdfs block >>> size >>> Then run simple spark job to read and partition based on 'n'. >>> >>> Hichame >>> >>> *From:* felixcheun...@hotmail.com >>> *Sent:* January 19, 2019 2:06 PM >>> *To:* 28shivamsha...@gmail.com;

Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-21 Thread Shivam Sharma
b to read and partition based on 'n'. >> >> Hichame >> >> *From:* felixcheun...@hotmail.com >> *Sent:* January 19, 2019 2:06 PM >> *To:* 28shivamsha...@gmail.com; user@spark.apache.org >> *Subject:* Re: Persist Dataframe to HDFS considering HDFS Block Size. >

Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-19 Thread Hichame El Khalfi
To: 28shivamsha...@gmail.com; user@spark.apache.org Subject: Re: Persist Dataframe to HDFS considering HDFS Block Size. You can call coalesce to combine partitions.. From: Shivam Sharma <28shivamsha...@gmail.com> Sent: Saturday, January 19, 2019 7:43 AM To

Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-19 Thread Felix Cheung
You can call coalesce to combine partitions.. From: Shivam Sharma <28shivamsha...@gmail.com> Sent: Saturday, January 19, 2019 7:43 AM To: user@spark.apache.org Subject: Persist Dataframe to HDFS considering HDFS Block Size. Hi All, I wanted to persist dataframe