Hi,

I am using Spark 1.5.0 to read gz files with textFileStream, but when new
files are dropped in the specified directory. I know this is only the case
with gz files as when i extract the file into the directory specified the
files are read on the next window and processed.

My code is here:

val comments = ssc.fileStream[LongWritable, Text,
TextInputFormat]("file:///tmp/", (f: Path) => true, newFilesOnly=false).
      map(pair => pair._2.toString)
    comments.foreachRDD(i => i.foreach(m=> println(m)))

any idea why the gz files are not being recognized.

Thanks in advance,

K

Reply via email to