Thanks Sean,

That worked just removing the /* and leaving it as /user/data

Seems to be streaming in.


> On 1 Dec 2014, at 22:50, Sean Owen <so...@cloudera.com> wrote:
> 
> Yes, in fact, that's the only way it works. You need
> "hdfs://localhost:8020/user/data", I believe.
> 
> (No it's not correct to write "hdfs:///...")
> 
> On Mon, Dec 1, 2014 at 10:41 PM, Benjamin Cuthbert
> <cuthbert....@gmail.com> wrote:
>> All,
>> 
>> Is it possible to stream on HDFS directory and listen for multiple files?
>> 
>> I have tried the following
>> 
>> val sparkConf = new SparkConf().setAppName("HdfsWordCount")
>> val ssc = new StreamingContext(sparkConf, Seconds(2))
>> val lines = ssc.textFileStream("hdfs://localhost:8020/user/data/*")
>> lines.filter(line => line.contains("GE"))
>> lines.print()
>> ssc.start()
>> 
>> But I get
>> 
>> 14/12/01 21:35:42 ERROR JobScheduler: Error generating jobs for time 
>> 1417469742000 ms
>> java.io.FileNotFoundException: File hdfs://localhost:8020/user/data/*does 
>> not exist.
>>        at 
>> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:408)
>>        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1416)
>>        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1456)
>>        at 
>> org.apache.spark.streaming.dstream.FileInputDStream.findNewFiles(FileInputDStream.scala:107)
>>        at 
>> org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:75)
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to