Hi,
we are using RDD#mapPartitions() to achieve the same.
Are there advantages/disadvantages of using one method over the other?
Regards,
Jeff
2015-02-26 20:02 GMT+01:00 Mark Hamstra m...@clearstorydata.com:
rdd.map(foo).filter(bar) and rdd.filter(bar).map(foo) will each already be
pipelined
You can flatMap:
rdd.flatMap { in =
if (condition(in)) {
Some(transformation(in))
} else {
None
}
}
On Thu, Feb 26, 2015 at 6:39 PM, Crystal Xing crystalxin...@gmail.com wrote:
Hi,
I have a text file input and I want to parse line by line and map each line
to another format. But
I see.
The reason we can use flatmap to map to null but not using map to map to
null is because
flatmap supports map to zero and more but map only support 1-1 mapping?
It seems Flatmap is more equivalent to haddop's map.
Thanks,
Zheng zhen
On Thu, Feb 26, 2015 at 10:44 AM, Sean Owen
Hi,
I have a text file input and I want to parse line by line and map each line
to another format. But at the same time, I want to filter out some lines I
do not need.
I wonder if there is a way to filter out those lines in the map function.
Do I have to do two steps filter and map? In that
rdd.map(foo).filter(bar) and rdd.filter(bar).map(foo) will each already be
pipelined into a single stage, so there generally isn't any need to
complect the map and filter into a single function.
Additionally, there is RDD#collect[U](f: PartialFunction[T, U])(implicit
arg0: ClassTag[U]): RDD[U],