Re: how to map and filter in one step?

2015-02-27 Thread Jeffrey Jedele
Hi, we are using RDD#mapPartitions() to achieve the same. Are there advantages/disadvantages of using one method over the other? Regards, Jeff 2015-02-26 20:02 GMT+01:00 Mark Hamstra m...@clearstorydata.com: rdd.map(foo).filter(bar) and rdd.filter(bar).map(foo) will each already be pipelined

Re: how to map and filter in one step?

2015-02-26 Thread Sean Owen
You can flatMap: rdd.flatMap { in = if (condition(in)) { Some(transformation(in)) } else { None } } On Thu, Feb 26, 2015 at 6:39 PM, Crystal Xing crystalxin...@gmail.com wrote: Hi, I have a text file input and I want to parse line by line and map each line to another format. But

Re: how to map and filter in one step?

2015-02-26 Thread Crystal Xing
I see. The reason we can use flatmap to map to null but not using map to map to null is because flatmap supports map to zero and more but map only support 1-1 mapping? It seems Flatmap is more equivalent to haddop's map. Thanks, Zheng zhen On Thu, Feb 26, 2015 at 10:44 AM, Sean Owen

how to map and filter in one step?

2015-02-26 Thread Crystal Xing
Hi, I have a text file input and I want to parse line by line and map each line to another format. But at the same time, I want to filter out some lines I do not need. I wonder if there is a way to filter out those lines in the map function. Do I have to do two steps filter and map? In that

Re: how to map and filter in one step?

2015-02-26 Thread Mark Hamstra
rdd.map(foo).filter(bar) and rdd.filter(bar).map(foo) will each already be pipelined into a single stage, so there generally isn't any need to complect the map and filter into a single function. Additionally, there is RDD#collect[U](f: PartialFunction[T, U])(implicit arg0: ClassTag[U]): RDD[U],