Thanks Sean, That worked just removing the /* and leaving it as /user/data
Seems to be streaming in. > On 1 Dec 2014, at 22:50, Sean Owen <so...@cloudera.com> wrote: > > Yes, in fact, that's the only way it works. You need > "hdfs://localhost:8020/user/data", I believe. > > (No it's not correct to write "hdfs:///...") > > On Mon, Dec 1, 2014 at 10:41 PM, Benjamin Cuthbert > <cuthbert....@gmail.com> wrote: >> All, >> >> Is it possible to stream on HDFS directory and listen for multiple files? >> >> I have tried the following >> >> val sparkConf = new SparkConf().setAppName("HdfsWordCount") >> val ssc = new StreamingContext(sparkConf, Seconds(2)) >> val lines = ssc.textFileStream("hdfs://localhost:8020/user/data/*") >> lines.filter(line => line.contains("GE")) >> lines.print() >> ssc.start() >> >> But I get >> >> 14/12/01 21:35:42 ERROR JobScheduler: Error generating jobs for time >> 1417469742000 ms >> java.io.FileNotFoundException: File hdfs://localhost:8020/user/data/*does >> not exist. >> at >> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:408) >> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1416) >> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1456) >> at >> org.apache.spark.streaming.dstream.FileInputDStream.findNewFiles(FileInputDStream.scala:107) >> at >> org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:75) >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org