You can return an RDD with null values inside, and afterwards filter on
"item != null"
In scala (or even in Java 8) you'd rather use Option/Optional, and in Scala
they're directly usable from Spark.
Exemple :

 sc.parallelize(1 to 1000).flatMap(item => if (item % 2 ==0) Some(item)
else None).collect()

res0: Array[Int] = Array(2, 4, 6, ....)

Regards,

Olivier.

Le sam. 18 avr. 2015 à 20:44, Steve Lewis <lordjoe2...@gmail.com> a écrit :

> I find a number of cases where I have an JavaRDD and I wish to transform
> the data and depending on a test return 0 or one item (don't suggest a
> filter - the real case is more complex). So I currently do something like
> the following - perform a flatmap returning a list with 0 or 1 entry
> depending on the isUsed function.
>
>      JavaRDD<Foo> original = ...
>   JavaRDD<Foo> words = original.flatMap(new FlatMapFunction<Foo, Foo>() {
>             @Override
>             public Iterable<Foo> call(final Foo s) throws Exception {
>             List<Foo> ret = new ArrayList<Foo>();
>                   if(isUsed(s))
>                        ret.add(transform(s));
>                 return ret; // contains 0 items if isUsed is false
>             }
>         });
>
> My question is can I do a map returning the transformed data and null if
> nothing is to be returned. as shown below - what does a Spark do with a map
> function returning null
>
>     JavaRDD<Foo> words = original.map(new MapFunction<String, String>() {
>             @Override
>           Foo  call(final Foo s) throws Exception {
>             List<Foo> ret = new ArrayList<Foo>();
>                   if(isUsed(s))
>                        return transform(s);
>                 return null; // not used - what happens now
>             }
>         });
>
>
>
>

Reply via email to