Re: Unsubscribe

2019-01-19 Thread Sherman Tong
Unsubscribe Sent from sherman's mobile device and sorry for being short. > On Jan 19, 2019, at 2:02 PM, Aditya Gautam wrote: > >

Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-19 Thread Hichame El Khalfi
You can do this in 2 passes (not one) A) save you dataset into hdfs with what you have. B) calculate number of partition, n= (size of your dataset)/hdfs block size Then run simple spark job to read and partition based on 'n'. Hichame From: felixcheun...@hotmail.com Sent: January 19, 2019 2:06 PM

Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-19 Thread Felix Cheung
You can call coalesce to combine partitions.. From: Shivam Sharma <28shivamsha...@gmail.com> Sent: Saturday, January 19, 2019 7:43 AM To: user@spark.apache.org Subject: Persist Dataframe to HDFS considering HDFS Block Size. Hi All, I wanted to persist dataframe

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-19 Thread Felix Cheung
To clarify, yarn actually supports excluding node right when requesting resources. It’s spark that doesn’t provide a way to populate such a blacklist. If you can change yarn config, the equivalent is node label: https://hadoop.apache.org/docs/r2.7.4/hadoop-yarn/hadoop-yarn-site/NodeLabel.html

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-19 Thread Li Gao
on yarn it is impossible afaik. on kubernetes you can use taints to keep certain nodes outside of spark On Fri, Jan 18, 2019 at 9:35 PM Felix Cheung wrote: > Not as far as I recall... > > > -- > *From:* Serega Sheypak > *Sent:* Friday, January 18, 2019 3:21 PM >

Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-19 Thread Shivam Sharma
Hi All, I wanted to persist dataframe on HDFS. Basically, I am inserting data into a HIVE table using Spark. Currently, at the time of writing to HIVE table I have set total shuffle partitions = 400 so total 400 files are being created which is not even considering HDFS block size. How can I tell