Re: hdfs streaming context

Sean Owen Mon, 01 Dec 2014 14:54:04 -0800

Yes, in fact, that's the only way it works. You need
"hdfs://localhost:8020/user/data", I believe.


(No it's not correct to write "hdfs:///...")

On Mon, Dec 1, 2014 at 10:41 PM, Benjamin Cuthbert
<cuthbert....@gmail.com> wrote:
> All,
>
> Is it possible to stream on HDFS directory and listen for multiple files?
>
> I have tried the following
>
> val sparkConf = new SparkConf().setAppName("HdfsWordCount")
> val ssc = new StreamingContext(sparkConf, Seconds(2))
> val lines = ssc.textFileStream("hdfs://localhost:8020/user/data/*")
> lines.filter(line => line.contains("GE"))
> lines.print()
> ssc.start()
>
> But I get
>
> 14/12/01 21:35:42 ERROR JobScheduler: Error generating jobs for time 
> 1417469742000 ms
> java.io.FileNotFoundException: File hdfs://localhost:8020/user/data/*does not 
> exist.
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:408)
>         at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1416)
>         at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1456)
>         at 
> org.apache.spark.streaming.dstream.FileInputDStream.findNewFiles(FileInputDStream.scala:107)
>         at 
> org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:75)
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: hdfs streaming context

Reply via email to