Hi,
On Mon, Aug 25, 2014 at 9:56 AM, Dean Chen <deanch...@gmail.com> wrote: > We are using HDFS for log storage where logs are flushed to HDFS every > minute, with a new file created for each hour. We would like to consume > these logs using spark streaming. > > The docs state that new HDFS will be picked up, but does Spark Streaming > support HDFS appends? > I don't think so. The docs at http://spark.apache.org/docs/1.0.0/api/scala/index.html#org.apache.spark.streaming.StreamingContext say that even for new files, "Files must be written to the monitored directory by 'moving' them from another location within the same file system." So I don't think you can just append to your files. Tobias