You can return an RDD with null values inside, and afterwards filter on "item != null" In scala (or even in Java 8) you'd rather use Option/Optional, and in Scala they're directly usable from Spark. Exemple :
sc.parallelize(1 to 1000).flatMap(item => if (item % 2 ==0) Some(item) else None).collect() res0: Array[Int] = Array(2, 4, 6, ....) Regards, Olivier. Le sam. 18 avr. 2015 à 20:44, Steve Lewis <lordjoe2...@gmail.com> a écrit : > I find a number of cases where I have an JavaRDD and I wish to transform > the data and depending on a test return 0 or one item (don't suggest a > filter - the real case is more complex). So I currently do something like > the following - perform a flatmap returning a list with 0 or 1 entry > depending on the isUsed function. > > JavaRDD<Foo> original = ... > JavaRDD<Foo> words = original.flatMap(new FlatMapFunction<Foo, Foo>() { > @Override > public Iterable<Foo> call(final Foo s) throws Exception { > List<Foo> ret = new ArrayList<Foo>(); > if(isUsed(s)) > ret.add(transform(s)); > return ret; // contains 0 items if isUsed is false > } > }); > > My question is can I do a map returning the transformed data and null if > nothing is to be returned. as shown below - what does a Spark do with a map > function returning null > > JavaRDD<Foo> words = original.map(new MapFunction<String, String>() { > @Override > Foo call(final Foo s) throws Exception { > List<Foo> ret = new ArrayList<Foo>(); > if(isUsed(s)) > return transform(s); > return null; // not used - what happens now > } > }); > > > >