Re: Best way to merge final output part files created by Spark job

2016-07-01 Thread kali.tumm...@gmail.com
Try using collasece function to repartition to desired number of partitions files, to merge already output files use hive and insert overwrite table using below options. set hive.merge.smallfiles.avgsize=256; set hive.merge.size.per.task=256; set -- View this message in context:

Re: Best way to merge final output part files created by Spark job

2015-09-17 Thread MEETHU MATHEW
he head or at tail part of the file. You have to use the format specified API to merge the data. Yong Date: Mon, 14 Sep 2015 09:10:33 +0200 Subject: Re: Best way to merge final output part files created by Spark job From: gmu...@stratio.com To: umesh.ka...@gmail.com CC: user@spark.apache.org Hi

RE: Best way to merge final output part files created by Spark job

2015-09-14 Thread java8964
ve to use the format specified API to merge the data. Yong Date: Mon, 14 Sep 2015 09:10:33 +0200 Subject: Re: Best way to merge final output part files created by Spark job From: gmu...@stratio.com To: umesh.ka...@gmail.com CC: user@spark.apache.org Hi, check out FileUtil.copyMerge function in

Re: Best way to merge final output part files created by Spark job

2015-09-14 Thread Gaspar Muñoz
Hi, check out FileUtil.copyMerge function in the Hadoop API