I use Spark Streaming where messages read from Kafka topics are stored into
JavaDStream<String> this rdd contains actual data. Now after going through
documentation and other help I have found we traverse JavaDStream using
foreachRDD

javaDStreamRdd.foreachRDD(new Function<JavaRDD&lt;String>,Void>() {
    public void call(JavaRDD<String> rdd) {
    //now I want to call mapPartitions on above rdd and generate new
JavaRDD<MyTable>
    JavaRDD<MyTable> rdd_records = rdd.mapPartitions(
      new FlatMapFunction<Iterator&lt;String>, MyTable>() {
          public Iterable<MyTable> call(Iterator<String> stringIterator)
throws Exception {
             //create List<MyTable> execute the following in while loop
             String[] fields = line.split(",");
             Record record = create Record from above fields 
             MyTable table = new MyTable();
             return table.append(record);
            }
         });
    }
    return null;
    }
});

Now my question how does above code work. I want to create JavaRDD<MyTable>
for each RDD of JavaDStream. How do I make sure above code will work fine
with all data and JavaRDD<MyTable> will contain all the data and wont lose
any previous data because of local JavaRDD<MyTable>.

It is like calling lambda function within lambda function. How do I make
sure local variable JavaRDD will point to contain all RDD?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-maintain-multiple-JavaRDD-created-within-another-method-like-javaStreamRDD-forEachRDD-tp23832.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to