RE: spark streaming filestream API

Chandra Mohan, Ananda Vel Murugan Wed, 14 Oct 2015 05:58:17 -0700

Hi,

Thanks for your response. My input format is the one I have created to handle 
the files as a whole i.e. WholeFileInputFormat I wrote one based on this 
example 
https://code.google.com/p/hadoop-course/source/browse/HadoopSamples/src/main/java/mr/wholeFile/WholeFileInputFormat.java?r=3
 In this case, key would be Nullwritable and value would be BytesWritable right?

Unfortunately my files are binary and not text files.

Regards,
Anand.C

From: Akhil Das [mailto:ak...@sigmoidanalytics.com]
Sent: Wednesday, October 14, 2015 5:31 PM
To: Chandra Mohan, Ananda Vel Murugan
Cc: user
Subject: Re: spark streaming filestream API

Key and Value are the ones that you are using with your InputFormat. Eg:

JavaReceiverInputDStream<String> lines = jssc.fileStream("/sigmoid", 
LongWritable.class, Text.class, TextInputFormat.class);

TextInputFormat uses the LongWritable as Key and Text as Value classes. If your 
data is plain CSV or text data then you can use the 
jssc.textFileStream("/sigmoid") without worrying about the InputFormat, Key and 
Value classes.

Thanks
Best Regards

On Wed, Oct 14, 2015 at 5:12 PM, Chandra Mohan, Ananda Vel Murugan 
<ananda.muru...@honeywell.com<mailto:ananda.muru...@honeywell.com>> wrote:
Hi All,

I have a directory hdfs which I want to monitor and whenever there is a new 
file in it, I want to parse that file and load the contents into a HIVE table. 
File format is proprietary and I have java parsers for parsing it. I am 
building a spark streaming application for this workflow. For doing this, I 
found JavaStreamingContext.filestream API. It takes four arguments directory 
path, key class, value class and inputformat. What should be values of key and 
value class? Please suggest. Thank you.

Regards,
Anand.C

RE: spark streaming filestream API

Reply via email to