I find a number of cases where I have an JavaRDD and I wish to transform the data and depending on a test return 0 or one item (don't suggest a filter - the real case is more complex). So I currently do something like the following - perform a flatmap returning a list with 0 or 1 entry depending on the isUsed function.
JavaRDD<Foo> original = ... JavaRDD<Foo> words = original.flatMap(new FlatMapFunction<Foo, Foo>() { @Override public Iterable<Foo> call(final Foo s) throws Exception { List<Foo> ret = new ArrayList<Foo>(); if(isUsed(s)) ret.add(transform(s)); return ret; // contains 0 items if isUsed is false } }); My question is can I do a map returning the transformed data and null if nothing is to be returned. as shown below - what does a Spark do with a map function returning null JavaRDD<Foo> words = original.map(new MapFunction<String, String>() { @Override Foo call(final Foo s) throws Exception { List<Foo> ret = new ArrayList<Foo>(); if(isUsed(s)) return transform(s); return null; // not used - what happens now } });