So you imagine something like this:

 JavaRDD<String> words = ...

 JavaRDD< Optional<String>> wordsFiltered = words.map(new
Function<String, Optional<String>>() {
    @Override
    public Optional<String> call(String s) throws Exception {
        if ((s.length()) % 2 == 1) // drop strings of odd length
            return Optional.empty();
        else
            return Optional.of(s);
    }
});


That seems to return the wrong type a  JavaRDD< Optional<String>>
which cannot be used as a JavaRDD<String> which is what the next step
expects


On Sun, Apr 19, 2015 at 12:17 PM, Evo Eftimov <evo.efti...@isecc.com> wrote:

> I am on the move at the moment so i cant try it immediately but from
> previous memory / experience i think if you return plain null you will get
> a spark exception
>
> Anyway yiu can try it and see what happens and then ask the question
>
> If you do get exception try Optional instead of plain null
>
>
> Sent from Samsung Mobile
>
>
> -------- Original message --------
> From: Olivier Girardot
> Date:2015/04/18 22:04 (GMT+00:00)
> To: Steve Lewis ,user@spark.apache.org
> Subject: Re: Can a map function return null
>
> You can return an RDD with null values inside, and afterwards filter on
> "item != null"
> In scala (or even in Java 8) you'd rather use Option/Optional, and in
> Scala they're directly usable from Spark.
> Exemple :
>
>  sc.parallelize(1 to 1000).flatMap(item => if (item % 2 ==0) Some(item)
> else None).collect()
>
> res0: Array[Int] = Array(2, 4, 6, ....)
>
> Regards,
>
> Olivier.
>
> Le sam. 18 avr. 2015 à 20:44, Steve Lewis <lordjoe2...@gmail.com> a
> écrit :
>
>> I find a number of cases where I have an JavaRDD and I wish to transform
>> the data and depending on a test return 0 or one item (don't suggest a
>> filter - the real case is more complex). So I currently do something like
>> the following - perform a flatmap returning a list with 0 or 1 entry
>> depending on the isUsed function.
>>
>>      JavaRDD<Foo> original = ...
>>   JavaRDD<Foo> words = original.flatMap(new FlatMapFunction<Foo, Foo>() {
>>             @Override
>>             public Iterable<Foo> call(final Foo s) throws Exception {
>>             List<Foo> ret = new ArrayList<Foo>();
>>                   if(isUsed(s))
>>                        ret.add(transform(s));
>>                 return ret; // contains 0 items if isUsed is false
>>             }
>>         });
>>
>> My question is can I do a map returning the transformed data and null if
>> nothing is to be returned. as shown below - what does a Spark do with a map
>> function returning null
>>
>>     JavaRDD<Foo> words = original.map(new MapFunction<String, String>() {
>>             @Override
>>           Foo  call(final Foo s) throws Exception {
>>             List<Foo> ret = new ArrayList<Foo>();
>>                   if(isUsed(s))
>>                        return transform(s);
>>                 return null; // not used - what happens now
>>             }
>>         });
>>
>>
>>
>>


-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

Reply via email to