Well you can do another map to turn Optional<String> into String as in the 
cases when Optional is empty you can store e.g. “NULL” as the value of the RDD 
element 

 

If this is not acceptable (based on the objectives of your architecture) and IF 
when returning plain null instead of Optional does throw Spark exception THEN 
as far as I am concerned, chess-mate 

 

From: Steve Lewis [mailto:lordjoe2...@gmail.com] 
Sent: Sunday, April 19, 2015 8:16 PM
To: Evo Eftimov
Cc: Olivier Girardot; user@spark.apache.org
Subject: Re: Can a map function return null

 

 

So you imagine something like this:

 

 JavaRDD<String> words = ...

 JavaRDD< Optional<String>> wordsFiltered = words.map(new Function<String, 
Optional<String>>() {
    @Override
    public Optional<String> call(String s) throws Exception {
        if ((s.length()) % 2 == 1) // drop strings of odd length
            return Optional.empty();
        else
            return Optional.of(s);
    }
});
 
That seems to return the wrong type a  JavaRDD< Optional<String>> which cannot 
be used as a JavaRDD<String> which is what the next step expects

 

On Sun, Apr 19, 2015 at 12:17 PM, Evo Eftimov <evo.efti...@isecc.com> wrote:

I am on the move at the moment so i cant try it immediately but from previous 
memory / experience i think if you return plain null you will get a spark 
exception

 

Anyway yiu can try it and see what happens and then ask the question 

 

If you do get exception try Optional instead of plain null

 

 

Sent from Samsung Mobile

 

-------- Original message --------

From: Olivier Girardot 

Date:2015/04/18 22:04 (GMT+00:00) 

To: Steve Lewis ,user@spark.apache.org 

Subject: Re: Can a map function return null 

 

You can return an RDD with null values inside, and afterwards filter on "item 
!= null" 
In scala (or even in Java 8) you'd rather use Option/Optional, and in Scala 
they're directly usable from Spark. 

Exemple : 

 sc.parallelize(1 to 1000).flatMap(item => if (item % 2 ==0) Some(item) else 
None).collect()

res0: Array[Int] = Array(2, 4, 6, ....)

Regards, 

Olivier.

 

Le sam. 18 avr. 2015 à 20:44, Steve Lewis <lordjoe2...@gmail.com> a écrit :

I find a number of cases where I have an JavaRDD and I wish to transform the 
data and depending on a test return 0 or one item (don't suggest a filter - the 
real case is more complex). So I currently do something like the following - 
perform a flatmap returning a list with 0 or 1 entry depending on the isUsed 
function.


 

     JavaRDD<Foo> original = ...

  JavaRDD<Foo> words = original.flatMap(new FlatMapFunction<Foo, Foo>() {

            @Override

            public Iterable<Foo> call(final Foo s) throws Exception {

            List<Foo> ret = new ArrayList<Foo>();

                  if(isUsed(s))

                       ret.add(transform(s));

                return ret; // contains 0 items if isUsed is false

            }

        });

 

My question is can I do a map returning the transformed data and null if 
nothing is to be returned. as shown below - what does a Spark do with a map 
function returning null

 

    JavaRDD<Foo> words = original.map(new MapFunction<String, String>() {

            @Override

          Foo  call(final Foo s) throws Exception {

            List<Foo> ret = new ArrayList<Foo>();

                  if(isUsed(s))

                       return transform(s);

                return null; // not used - what happens now

            }

        });

 

 

 





 

-- 

Steven M. Lewis PhD

4221 105th Ave NE

Kirkland, WA 98033

206-384-1340 (cell)
Skype lordjoe_com

Reply via email to