I find a number of cases where I have an JavaRDD and I wish to transform
the data and depending on a test return 0 or one item (don't suggest a
filter - the real case is more complex). So I currently do something like
the following - perform a flatmap returning a list with 0 or 1 entry
depending on the isUsed function.

     JavaRDD<Foo> original = ...
  JavaRDD<Foo> words = original.flatMap(new FlatMapFunction<Foo, Foo>() {
            @Override
            public Iterable<Foo> call(final Foo s) throws Exception {
            List<Foo> ret = new ArrayList<Foo>();
                  if(isUsed(s))
                       ret.add(transform(s));
                return ret; // contains 0 items if isUsed is false
            }
        });

My question is can I do a map returning the transformed data and null if
nothing is to be returned. as shown below - what does a Spark do with a map
function returning null

    JavaRDD<Foo> words = original.map(new MapFunction<String, String>() {
            @Override
          Foo  call(final Foo s) throws Exception {
            List<Foo> ret = new ArrayList<Foo>();
                  if(isUsed(s))
                       return transform(s);
                return null; // not used - what happens now
            }
        });

Reply via email to