Hi, Thanks for your response. My input format is the one I have created to handle the files as a whole i.e. WholeFileInputFormat I wrote one based on this example https://code.google.com/p/hadoop-course/source/browse/HadoopSamples/src/main/java/mr/wholeFile/WholeFileInputFormat.java?r=3 In this case, key would be Nullwritable and value would be BytesWritable right?
Unfortunately my files are binary and not text files. Regards, Anand.C From: Akhil Das [mailto:ak...@sigmoidanalytics.com] Sent: Wednesday, October 14, 2015 5:31 PM To: Chandra Mohan, Ananda Vel Murugan Cc: user Subject: Re: spark streaming filestream API Key and Value are the ones that you are using with your InputFormat. Eg: JavaReceiverInputDStream<String> lines = jssc.fileStream("/sigmoid", LongWritable.class, Text.class, TextInputFormat.class); TextInputFormat uses the LongWritable as Key and Text as Value classes. If your data is plain CSV or text data then you can use the jssc.textFileStream("/sigmoid") without worrying about the InputFormat, Key and Value classes. Thanks Best Regards On Wed, Oct 14, 2015 at 5:12 PM, Chandra Mohan, Ananda Vel Murugan <ananda.muru...@honeywell.com<mailto:ananda.muru...@honeywell.com>> wrote: Hi All, I have a directory hdfs which I want to monitor and whenever there is a new file in it, I want to parse that file and load the contents into a HIVE table. File format is proprietary and I have java parsers for parsing it. I am building a spark streaming application for this workflow. For doing this, I found JavaStreamingContext.filestream API. It takes four arguments directory path, key class, value class and inputformat. What should be values of key and value class? Please suggest. Thank you. Regards, Anand.C