(PS the Scala code I posted is a poor way to do it -- it would materialize the entire cartesian product in memory. You can use .iterator or .view to fix that.)
Ah, so you want sum of distances between successive points. val points: List[(Double,Double)] = ... points.sliding(2).map { case List(p1,p2) => distance(p1,p2) }.sum If you import org.apache.spark.mllib.rdd.RDDFunctions._ you should have access to something similar in Spark over an RDD. It gives you a sliding() function that produces Arrays of sequential elements. Note that RDDs don't really guarantee anything about ordering though, so this only makes sense if you've already sorted some upstream RDD by a timestamp or sequence number. On Mon, Jan 26, 2015 at 1:21 AM, Steve Nunez <snu...@hortonworks.com> wrote: > Not combinations, linear distances, e.g., given: List[ (x1,y1), (x2,y2), > (x3,y3) ], compute the sum of: > > distance (x1,y2) and (x2,y2) and > distance (x2,y2) and (x3,y3) > > Imagine that the list of coordinate point comes from a GPS and describes a > trip. > > - Steve > > From: Joseph Lust <jl...@mc10inc.com> > Date: Sunday, January 25, 2015 at 17:17 > To: Steve Nunez <snu...@hortonworks.com>, "user@spark.apache.org" > <user@spark.apache.org> > Subject: Re: Pairwise Processing of a List > > So you’ve got a point A and you want the sum of distances between it and all > other points? Or am I misunderstanding you? > > // target point, can be Broadcast global sent to all workers > val tarPt = (10,20) > val pts = Seq((2,2),(3,3),(2,3),(10,2)) > val rdd= sc.parallelize(pts) > rdd.map( pt => Math.sqrt( Math.pow(tarPt._1 - pt._1,2) + Math.pow(tarPt._2 - > pt._2,2)) ).reduce( (d1,d2) => d1+d2) > > -Joe > > From: Steve Nunez <snu...@hortonworks.com> > Date: Sunday, January 25, 2015 at 7:32 PM > To: "user@spark.apache.org" <user@spark.apache.org> > Subject: Pairwise Processing of a List > > Spark Experts, > > I’ve got a list of points: List[(Float, Float)]) that represent (x,y) > coordinate pairs and need to sum the distance. It’s easy enough to compute > the distance: > > case class Point(x: Float, y: Float) { > def distance(other: Point): Float = > sqrt(pow(x - other.x, 2) + pow(y - other.y, 2)).toFloat > } > > (in this case I create a ‘Point’ class, but the maths are the same). > > What I can’t figure out is the ‘right’ way to sum distances between all the > points. I can make this work by traversing the list with a for loop and > using indices, but this doesn’t seem right. > > Anyone know a clever way to process List[(Float, Float)]) in a pairwise > fashion? > > Regards, > - Steve > > > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader of > this message is not the intended recipient, you are hereby notified that any > printing, copying, dissemination, distribution, disclosure or forwarding of > this communication is strictly prohibited. If you have received this > communication in error, please contact the sender immediately and delete it > from your system. Thank You. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org