Yes, in fact, that's the only way it works. You need "hdfs://localhost:8020/user/data", I believe.
(No it's not correct to write "hdfs:///...") On Mon, Dec 1, 2014 at 10:41 PM, Benjamin Cuthbert <cuthbert....@gmail.com> wrote: > All, > > Is it possible to stream on HDFS directory and listen for multiple files? > > I have tried the following > > val sparkConf = new SparkConf().setAppName("HdfsWordCount") > val ssc = new StreamingContext(sparkConf, Seconds(2)) > val lines = ssc.textFileStream("hdfs://localhost:8020/user/data/*") > lines.filter(line => line.contains("GE")) > lines.print() > ssc.start() > > But I get > > 14/12/01 21:35:42 ERROR JobScheduler: Error generating jobs for time > 1417469742000 ms > java.io.FileNotFoundException: File hdfs://localhost:8020/user/data/*does not > exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:408) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1416) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1456) > at > org.apache.spark.streaming.dstream.FileInputDStream.findNewFiles(FileInputDStream.scala:107) > at > org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:75) > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org