subject:"Global sequential access of elements in RDD"

Re: Global sequential access of elements in RDD

2015-02-27 Thread Imran Rashid

Why would you want to use spark to sequentially process your entire data set? The entire purpose is to let you do distributed processing -- which means letting partitions get processed simultaneously by different cores / nodes. that being said, occasionally in a bigger pipeline with a lot of

Re: Global sequential access of elements in RDD

2015-02-27 Thread Wush Wu

Thanks for your reply. But your code snippet uses the `collect` which is not feasible for me. My algorithm involves a large amount of data and I do not want to transmit them. Wush 2015-02-27 16:27 GMT+08:00 Yanbo Liang yblia...@gmail.com: Actually, sortBy will return an ordered RDD. Your

Global sequential access of elements in RDD

2015-02-26 Thread Wush Wu

Dear all, I want to implement some sequential algorithm on RDD. For example: val conf = new SparkConf() conf.setMaster(local[2]). setAppName(SequentialSuite) val sc = new SparkContext(conf) val rdd = sc. parallelize(Array(1, 3, 2, 7, 1, 4, 2, 5, 1, 8, 9), 2). sortBy(x = x, true)