Hello, I would like to parallelize my work on multiple RDDs I have. I wanted to know if spark can support a "foreach" on an RDD of RDDs. Here's a java example:
public static void main(String[] args) { SparkConf sparkConf = new SparkConf().setAppName("testapp"); sparkConf.setMaster("local"); JavaSparkContext sc = new JavaSparkContext(sparkConf); List<String> list = Arrays.asList(new String[] {"1", "2", "3"}); JavaRDD<String> rdd = sc.parallelize(list); List<String> list1 = Arrays.asList(new String[] {"a", "b", "c"}); JavaRDD<String> rdd1 = sc.parallelize(list1); List<JavaRDD<String>> rddList = new ArrayList<JavaRDD<String>>(); rddList.add(rdd); rddList.add(rdd1); JavaRDD<JavaRDD<String>> rddOfRdds = sc.parallelize(rddList); System.out.println(rddOfRdds.count()); rddOfRdds.foreach(new VoidFunction<JavaRDD<String>>() { @Override public void call(JavaRDD<String> t) throws Exception { System.out.println(t.count()); } }); } >From this code I'm getting a NullPointerException on the internal count method: Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1.0:0 failed 1 times, most recent failure: Exception failure in TID 1 on host localhost: java.lang.NullPointerException org.apache.spark.rdd.RDD.count(RDD.scala:861) org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:365) org.apache.spark.api.java.JavaRDD.count(JavaRDD.scala:29) Help will be appreciated. Thanks, Tomer --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org