Hi, check out FileUtil.copyMerge function in the Hadoop API <https://hadoop.apache.org/docs/r2.7.0/api/org/apache/hadoop/fs/FileUtil.html#copyMerge(org.apache.hadoop.fs.FileSystem, org.apache.hadoop.fs.Path, org.apache.hadoop.fs.FileSystem, org.apache.hadoop.fs.Path, boolean, org.apache.hadoop.conf.Configuration, java.lang.String)>.
It's simple, 1. Get the hadoop configuration from Spark Context FileSystem fs = FileSystem.get(sparkContext.hadoopConfiguration()); 2. Create new Path <https://hadoop.apache.org/docs/r2.7.0/api/org/apache/hadoop/fs/Path.html>with destination and source directory. 3. Call copyMerge FileUtil.copyMerge(fs, inputPath, fs, destPath, true, sparkContext.hadoopConfiguration(), null); 2015-09-13 23:25 GMT+02:00 unk1102 <umesh.ka...@gmail.com>: > Hi I have a spark job which creates around 500 part files inside each > directory I process. So I have thousands of such directories. So I need to > merge these small small 500 part files. I am using > spark.sql.shuffle.partition as 500 and my final small files are ORC files. > Is there a way to merge orc files in Spark if not please suggest the best > way to merge files created by Spark job in hdfs please guide. Thanks much. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Best-way-to-merge-final-output-part-files-created-by-Spark-job-tp24681.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Gaspar Muñoz @gmunozsoria <http://www.stratio.com/> Vía de las dos Castillas, 33, Ática 4, 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*