Hello,

I would like to parallelize my work on multiple RDDs I have. I wanted
to know if spark can support a "foreach" on an RDD of RDDs. Here's a
java example:

    public static void main(String[] args) {

        SparkConf sparkConf = new SparkConf().setAppName("testapp");
        sparkConf.setMaster("local");

    JavaSparkContext sc = new JavaSparkContext(sparkConf);


    List<String> list = Arrays.asList(new String[] {"1", "2", "3"});
    JavaRDD<String> rdd = sc.parallelize(list);

    List<String> list1 = Arrays.asList(new String[] {"a", "b", "c"});
   JavaRDD<String> rdd1 = sc.parallelize(list1);

    List<JavaRDD<String>> rddList = new ArrayList<JavaRDD<String>>();
    rddList.add(rdd);
    rddList.add(rdd1);


    JavaRDD<JavaRDD<String>> rddOfRdds = sc.parallelize(rddList);
    System.out.println(rddOfRdds.count());


    rddOfRdds.foreach(new VoidFunction<JavaRDD<String>>() {

   @Override
    public void call(JavaRDD<String> t) throws Exception {
         System.out.println(t.count());
    }

   });
}

>From this code I'm getting a NullPointerException on the internal count method:

Exception in thread "main" org.apache.spark.SparkException: Job
aborted due to stage failure: Task 1.0:0 failed 1 times, most recent
failure: Exception failure in TID 1 on host localhost:
java.lang.NullPointerException

        org.apache.spark.rdd.RDD.count(RDD.scala:861)

        org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:365)

        org.apache.spark.api.java.JavaRDD.count(JavaRDD.scala:29)

Help will be appreciated.

Thanks,
Tomer

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to