Currently, spark streaming would create a new directory for every batch and
store the data to it (whether it has anything or not). There is no direct
append call as of now, but you can achieve this either with
FileUtil.copyMerge
<http://apache-spark-user-list.1001560.n3.nabble.com/save-spark-streaming-output-to-single-file-on-hdfs-td21124.html#a21167>
or have a separate program which will do the clean up for you.

Thanks
Best Regards

On Sat, Aug 15, 2015 at 5:20 AM, Mohit Anchlia <mohitanch...@gmail.com>
wrote:

> Spark stream seems to be creating 0 bytes files even when there is no
> data. Also, I have 2 concerns here:
>
> 1) Extra unnecessary files is being created from the output
> 2) Hadoop doesn't work really well with too many files and I see that it
> is creating a directory with a timestamp every 1 second. Is there a better
> way of writing a file, may be use some kind of append mechanism where one
> doesn't have to change the batch interval.
>

Reply via email to