How to increase Spark partitions for the DataFrame?

unk1102 Thu, 08 Oct 2015 11:14:41 -0700

Hi I have the following code where I read ORC files from HDFS and it loads
directory which contains 12 ORC files. Now since HDFS directory contains 12
files it will create 12 partitions by default. These directory is huge and
when ORC files gets decompressed it becomes around 10 GB how do I increase
partitions for the below code so that my Spark job runs faster and does not
hang for long time because of reading 10 GB files through shuffle in 12
partitions. Please guide.


DataFrame df =
hiveContext.read().format("orc").load("/hdfs/path/to/orc/files/");
df.select().groupby(..)




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-increase-Spark-partitions-for-the-DataFrame-tp24980.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

How to increase Spark partitions for the DataFrame?

Reply via email to