Hi All, 

I am running hive in spark-sql in yarn client mode, the sql is pretty simple
load dynamic partitions to target parquet table.

I used hive configurations parameters such as  (set
hive.merge.smallfiles.avgsize=256000000;set
hive.merge.size.per.task=2560000000;) which usually merges small files to
256mb block size these parameters are supported in spark-sql is there other
way around to merge number of small parquet files to large one.

if its a scala application I can use collasece() function or repartition but
here we are not using spark-scala application its just plain spark-sql.


Thanks
Sri 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-parquet-too-many-small-files-tp27264.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to