Re: Best way to merge files from streaming jobs

2016-03-08 Thread Sumedh Wale
On Saturday 05 March 2016 02:39 AM, Jelez Raditchkov wrote: My streaming job is creating files on S3. The problem is that those files end up very small if I just write them to S3 directly. This is why I use coalesce() to reduce the

Best way to merge files from streaming jobs

2016-03-04 Thread Jelez Raditchkov
My streaming job is creating files on S3.The problem is that those files end up very small if I just write them to S3 directly.This is why I use coalesce() to reduce the number of files and make them larger. However, coalesce shuffles data and my job processing time ends up higher than