Re: spark parquet too many small files ?

nsalian Fri, 01 Jul 2016 17:36:07 -0700

Hi Sri,

Thanks for the question.
You can simply start by doing this in the initial stage:


val sqlContext = new SQLContext(sc)
val customerList = sqlContext.read.json(args(0)).coalesce(20) //using a json
example here

where the argument is the path to the file(s). This will reduce the
partitions.
You can proceed with repartitioning the data further on. The goal would be
to reduce the number of files in the end as you do a saveAsParquet.

Hope that helps.



-----
Neelesh S. Salian
Cloudera
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-parquet-too-many-small-files-tp27264p27265.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: spark parquet too many small files ?

Reply via email to