Why would you want to use spark to sequentially process your entire data
set? The entire purpose is to let you do distributed processing -- which
means letting partitions get processed simultaneously by different cores /
nodes.
that being said, occasionally in a bigger pipeline with a lot of
Thanks for your reply.
But your code snippet uses the `collect` which is not feasible for me.
My algorithm involves a large amount of data and I do not want to transmit
them.
Wush
2015-02-27 16:27 GMT+08:00 Yanbo Liang yblia...@gmail.com:
Actually, sortBy will return an ordered RDD.
Your
Dear all,
I want to implement some sequential algorithm on RDD.
For example:
val conf = new SparkConf()
conf.setMaster(local[2]).
setAppName(SequentialSuite)
val sc = new SparkContext(conf)
val rdd = sc.
parallelize(Array(1, 3, 2, 7, 1, 4, 2, 5, 1, 8, 9), 2).
sortBy(x = x, true)