Too many files/dirs in hdfs

Mohit Anchlia Fri, 14 Aug 2015 16:51:19 -0700

Spark stream seems to be creating 0 bytes files even when there is no data.
Also, I have 2 concerns here:


1) Extra unnecessary files is being created from the output
2) Hadoop doesn't work really well with too many files and I see that it is
creating a directory with a timestamp every 1 second. Is there a better way
of writing a file, may be use some kind of append mechanism where one
doesn't have to change the batch interval.

Too many files/dirs in hdfs

Reply via email to