Re: accessing partition i+1 from mapper of partition i

2014-05-22 Thread Austin Gibbons
Mohit, if you want to end up with (1 .. N) , why don't you skip the logic for finding missing values, and generate it directly? val max = myCollection.reduce(math.max) sc.parallelize((0 until max)) In either case, you don't need to call cache, which will force it into memory - you can do

Re: accessing partition i+1 from mapper of partition i

2014-05-22 Thread Mohit Jaggi
Austin, I made up a mock example...my real use case is more complex. I used foreach() instead of collect/cache..that forces the accumulable to be evaluated. On another thread Xiangrui pointed me to a sliding window rdd in mlllib that is a great alternative (although I did not switch to using it)

Re: accessing partition i+1 from mapper of partition i

2014-05-16 Thread Brian Gawalt
I don't think there's a direct way of bleeding elements across partitions. But you could write it yourself relatively succinctly: A) Sort the RDD B) Look at the sorted RDD's partitions with the .mapParititionsWithIndex( ) method. Map each partition to its partition ID, and its maximum element.

accessing partition i+1 from mapper of partition i

2014-05-14 Thread Mohit Jaggi
Hi, I am trying to find a way to fill in missing values in an RDD. The RDD is a sorted sequence. For example, (1, 2, 3, 5, 8, 11, ...) I need to fill in the missing numbers and get (1,2,3,4,5,6,7,8,9,10,11) One way to do this is to slide and zip rdd1 = sc.parallelize(List(1, 2, 3, 5, 8, 11,