So you imagine something like this: JavaRDD<String> words = ...
JavaRDD< Optional<String>> wordsFiltered = words.map(new Function<String, Optional<String>>() { @Override public Optional<String> call(String s) throws Exception { if ((s.length()) % 2 == 1) // drop strings of odd length return Optional.empty(); else return Optional.of(s); } }); That seems to return the wrong type a JavaRDD< Optional<String>> which cannot be used as a JavaRDD<String> which is what the next step expects On Sun, Apr 19, 2015 at 12:17 PM, Evo Eftimov <evo.efti...@isecc.com> wrote: > I am on the move at the moment so i cant try it immediately but from > previous memory / experience i think if you return plain null you will get > a spark exception > > Anyway yiu can try it and see what happens and then ask the question > > If you do get exception try Optional instead of plain null > > > Sent from Samsung Mobile > > > -------- Original message -------- > From: Olivier Girardot > Date:2015/04/18 22:04 (GMT+00:00) > To: Steve Lewis ,user@spark.apache.org > Subject: Re: Can a map function return null > > You can return an RDD with null values inside, and afterwards filter on > "item != null" > In scala (or even in Java 8) you'd rather use Option/Optional, and in > Scala they're directly usable from Spark. > Exemple : > > sc.parallelize(1 to 1000).flatMap(item => if (item % 2 ==0) Some(item) > else None).collect() > > res0: Array[Int] = Array(2, 4, 6, ....) > > Regards, > > Olivier. > > Le sam. 18 avr. 2015 à 20:44, Steve Lewis <lordjoe2...@gmail.com> a > écrit : > >> I find a number of cases where I have an JavaRDD and I wish to transform >> the data and depending on a test return 0 or one item (don't suggest a >> filter - the real case is more complex). So I currently do something like >> the following - perform a flatmap returning a list with 0 or 1 entry >> depending on the isUsed function. >> >> JavaRDD<Foo> original = ... >> JavaRDD<Foo> words = original.flatMap(new FlatMapFunction<Foo, Foo>() { >> @Override >> public Iterable<Foo> call(final Foo s) throws Exception { >> List<Foo> ret = new ArrayList<Foo>(); >> if(isUsed(s)) >> ret.add(transform(s)); >> return ret; // contains 0 items if isUsed is false >> } >> }); >> >> My question is can I do a map returning the transformed data and null if >> nothing is to be returned. as shown below - what does a Spark do with a map >> function returning null >> >> JavaRDD<Foo> words = original.map(new MapFunction<String, String>() { >> @Override >> Foo call(final Foo s) throws Exception { >> List<Foo> ret = new ArrayList<Foo>(); >> if(isUsed(s)) >> return transform(s); >> return null; // not used - what happens now >> } >> }); >> >> >> >> -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com