Try ("hdfs:///localhost:8020/user/data/*")
With 3 "/". Thx tri -----Original Message----- From: Benjamin Cuthbert [mailto:cuthbert....@gmail.com] Sent: Monday, December 01, 2014 4:41 PM To: user@spark.apache.org Subject: hdfs streaming context All, Is it possible to stream on HDFS directory and listen for multiple files? I have tried the following val sparkConf = new SparkConf().setAppName("HdfsWordCount") val ssc = new StreamingContext(sparkConf, Seconds(2)) val lines = ssc.textFileStream("hdfs://localhost:8020/user/data/*") lines.filter(line => line.contains("GE")) lines.print() ssc.start() But I get 14/12/01 21:35:42 ERROR JobScheduler: Error generating jobs for time 1417469742000 ms java.io.FileNotFoundException: File hdfs://localhost:8020/user/data/*does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:408) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1416) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1456) at org.apache.spark.streaming.dstream.FileInputDStream.findNewFiles(FileInputDStream.scala:107) at org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:75) --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org