rdd.map(foo).filter(bar) and rdd.filter(bar).map(foo) will each already be
pipelined into a single stage, so there generally isn't any need to
complect the map and filter into a single function.

Additionally, there is RDD#collect[U](f: PartialFunction[T, U])(implicit
arg0: ClassTag[U]): RDD[U], which only applies the partial function to
those elements of the RDD for which f is defined.

On Thu, Feb 26, 2015 at 10:49 AM, Crystal Xing <crystalxin...@gmail.com>
wrote:

> I see.
> The reason we can use flatmap to map to null but not using map to map to
> null is because
> flatmap supports map to zero and more  but map only support 1-1 mapping?
>
> It seems Flatmap is more equivalent to haddop's map.
>
>
> Thanks,
>
> Zheng zhen
>
> On Thu, Feb 26, 2015 at 10:44 AM, Sean Owen <so...@cloudera.com> wrote:
>
>> You can flatMap:
>>
>> rdd.flatMap { in =>
>>   if (condition(in)) {
>>     Some(transformation(in))
>>   } else {
>>     None
>>   }
>> }
>>
>> On Thu, Feb 26, 2015 at 6:39 PM, Crystal Xing <crystalxin...@gmail.com>
>> wrote:
>> > Hi,
>> > I have a text file input and I want to parse line by line and map each
>> line
>> > to another format. But at the same time, I want to filter out some
>> lines I
>> > do not need.
>> >
>> > I wonder if there is a way to filter out those lines in the map
>> function.
>> >
>> > Do I have to do two steps filter and map?  In that way, I have to scan
>> and
>> > parse the lines twice in order to filter and map.
>> >
>> > If I map those unwanted line to null and filter out null, will that
>> work?
>> > never tried yet.
>> >
>> > Thanks,
>> >
>> > Zheng zheng
>>
>
>

Reply via email to