Re: Finding previous and next element in a sorted RDD

2014-08-23 Thread Victor Tso-Guillen
Using mapPartitions, you could get the neighbors within a partition, but if
you think about it, it's much more difficult to accomplish this for the
complete dataset.


On Fri, Aug 22, 2014 at 11:24 AM, cjwang c...@cjwang.us wrote:

 It would be nice if an RDD that was massaged by OrderedRDDFunction could
 know
 its neighbors.




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12664.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Finding previous and next element in a sorted RDD

2014-08-22 Thread Evan Chan
There's no way to avoid a shuffle due to the first and last elements
of each partition needing to be computed with the others, but I wonder
if there is a way to do a minimal shuffle.

On Thu, Aug 21, 2014 at 6:13 PM, cjwang c...@cjwang.us wrote:
 One way is to do zipWithIndex on the RDD.  Then use the index as a key.  Add
 or subtract 1 for previous or next element.  Then use cogroup or join to
 bind them together.

 val idx = input.zipWithIndex
 val previous = idx.map(x = (x._2+1, x._1))
 val current = idx.map(x = (x._2, x._1))
 val next = idx.map(x = (x._2-1, x._1))

 val joined = current leftOuterJoin previous leftOuterJoin next

 Code looks clean to me, but I feel uneasy about the performance of join.



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12623.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Finding previous and next element in a sorted RDD

2014-08-22 Thread cjwang
It would be nice if an RDD that was massaged by OrderedRDDFunction could know
its neighbors.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12664.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Finding previous and next element in a sorted RDD

2014-08-21 Thread cjwang
I have an RDD containing elements sorted in certain order.  I would like to
map over the elements knowing the values of their respective previous and
next elements.

With regular List, I used to do this:  (input is a List below)

// The first of the previous measures and the last of the next measures are
dummy
val dummy = new Measure()
val ml = (dummy :: input) :+ dummy

// Take 3 element at the time.  Then 1st is the previous, the middle is the
current, and
// the last is the next.
ml.iterator.sliding(3).map( tri = produceMeasure(tri.head, tri.tail.head,
tri.last) ) toList

Now, in RDD, how do I do that (elegantly I hope)?  I thought about zipping
an RDD with shifted self, but there are some messy partition issues about
zip.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Finding previous and next element in a sorted RDD

2014-08-21 Thread cjwang
One way is to do zipWithIndex on the RDD.  Then use the index as a key.  Add
or subtract 1 for previous or next element.  Then use cogroup or join to
bind them together.

val idx = input.zipWithIndex
val previous = idx.map(x = (x._2+1, x._1))
val current = idx.map(x = (x._2, x._1))
val next = idx.map(x = (x._2-1, x._1))

val joined = current leftOuterJoin previous leftOuterJoin next

Code looks clean to me, but I feel uneasy about the performance of join.  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-tp12621p12623.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org