Re: reducing number of output files

2015-01-23 Thread Sean Owen
It does not necessarily shuffle, yes. I believe it will not if you are strictly reducing the number of partitions, and do not force a shuffle. So I think the answer is 'yes'. If you have a huge number of small files, you can also consider wholeTextFiles, which gives you entire files of content in

Re: reducing number of output files

2015-01-22 Thread Sean Owen
One output file is produced per partition. If you want fewer, use coalesce() before saving the RDD. On Thu, Jan 22, 2015 at 10:46 PM, Kane Kim kane.ist...@gmail.com wrote: How I can reduce number of output files? Is there a parameter to saveAsTextFile? Thanks.

Re: reducing number of output files

2015-01-22 Thread DEVAN M.S.
Rdd.coalesce(1) will coalesce RDD and give only one output file. coalesce(2) will give 2 wise versa. On Jan 23, 2015 4:58 AM, Sean Owen so...@cloudera.com wrote: One output file is produced per partition. If you want fewer, use coalesce() before saving the RDD. On Thu, Jan 22, 2015 at 10:46