rdd.map(foo).filter(bar) and rdd.filter(bar).map(foo) will each already be pipelined into a single stage, so there generally isn't any need to complect the map and filter into a single function.
Additionally, there is RDD#collect[U](f: PartialFunction[T, U])(implicit arg0: ClassTag[U]): RDD[U], which only applies the partial function to those elements of the RDD for which f is defined. On Thu, Feb 26, 2015 at 10:49 AM, Crystal Xing <crystalxin...@gmail.com> wrote: > I see. > The reason we can use flatmap to map to null but not using map to map to > null is because > flatmap supports map to zero and more but map only support 1-1 mapping? > > It seems Flatmap is more equivalent to haddop's map. > > > Thanks, > > Zheng zhen > > On Thu, Feb 26, 2015 at 10:44 AM, Sean Owen <so...@cloudera.com> wrote: > >> You can flatMap: >> >> rdd.flatMap { in => >> if (condition(in)) { >> Some(transformation(in)) >> } else { >> None >> } >> } >> >> On Thu, Feb 26, 2015 at 6:39 PM, Crystal Xing <crystalxin...@gmail.com> >> wrote: >> > Hi, >> > I have a text file input and I want to parse line by line and map each >> line >> > to another format. But at the same time, I want to filter out some >> lines I >> > do not need. >> > >> > I wonder if there is a way to filter out those lines in the map >> function. >> > >> > Do I have to do two steps filter and map? In that way, I have to scan >> and >> > parse the lines twice in order to filter and map. >> > >> > If I map those unwanted line to null and filter out null, will that >> work? >> > never tried yet. >> > >> > Thanks, >> > >> > Zheng zheng >> > >