Hi Sri, Thanks for the question. You can simply start by doing this in the initial stage:
val sqlContext = new SQLContext(sc) val customerList = sqlContext.read.json(args(0)).coalesce(20) //using a json example here where the argument is the path to the file(s). This will reduce the partitions. You can proceed with repartitioning the data further on. The goal would be to reduce the number of files in the end as you do a saveAsParquet. Hope that helps. ----- Neelesh S. Salian Cloudera -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-parquet-too-many-small-files-tp27264p27265.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org