Spark Streaming - How to write RDD's in same directory ?

Shailesh Birari Tue, 21 Oct 2014 15:53:03 -0700

Hello,

Spark 1.1.0, Hadoop 2.4.1


I have written a Spark streaming application. And I am getting
FileAlreadyExistsException for rdd.saveAsTextFile(outputFolderPath).
Here is brief what I am is trying to do.
My application is creating text file stream using Java Stream context. The
input file is on HDFS.

        JavaDStream<String> textStream = ssc.textFileStream(InputFile);

Then it is comparing each line of input stream with some data and filtering
it. The filtered data I am storing in JavaDStream<String>.

                 JavaDStream<String> suspectedStream= textStream.flatMap(new
FlatMapFunction<String,String>(){
                        @Override
                        public Iterable<String> call(String line) throws
Exception {

                        List<String> filteredList = new ArrayList<String>();

                        // doing filter job

                        return filteredList;
                }

And this filteredList I am storing in HDFS as:
                
             suspectedStream.foreach(new
Function<JavaRDD&lt;String>,Void>(){
                        @Override
                        public Void call(JavaRDD<String> rdd) throws
Exception {
                                rdd.saveAsTextFile(outputFolderPath);
                                return null;
                }});


But with this I am receiving 
org.apache.hadoop.mapred.FileAlreadyExistsException.

I tried with appending random number with outputFolderPath and its working. 
But my requirement is to collect all output in one directory. 

Can you please suggest if there is any way to get rid of this exception ?

Thanks,
  Shailesh




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-How-to-write-RDD-s-in-same-directory-tp16962.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark Streaming - How to write RDD's in same directory ?

Reply via email to