If this is really about just Scala Lists, then a simple answer (using tuples of doubles) is:
val points: List[(Double,Double)] = ... val distances = for (p1 <- points; p2 <- points) yield { val dx = p1._1 - p2._1 val dy = p1._2 - p2._2 math.sqrt(dx*dx + dy*dy) } distances.sum / 2 It's "/ 2" since this counts every pair twice. You could double the speed of that, with a slightly more complex formulation using indices, that avoids comparing points to themselves and makes each comparison just once. If you really need the sum of all pairwise distances, I don't think you can do better than that (modulo dealing with duplicates intelligently). If we're talking RDDs, then the simple answer is similar: val pointsRDD: RDD[(Double,Double)] = ... val distancesRDD = pointsRDD.cartesian(pointsRDD).map { case (p1, p2) => ... } distancesRDD.sum / 2 It takes more work to make the same optimization, and involves zipWithIndex, but is possible. If the reason we're talking about Lists is that the set of points is still fairly small, but big enough that all-pairs deserves distributed computation, then I'd parallelize the List into an RDD, and also broadcast it, and then implement a hybrid of these two approaches. You'd have the outer loop over points happening in parallel via the RDD, and inner loop happening locally over the local broadcasted copy in memory. ... and if the use case isn't really to find all-pairs distances and their sum, maybe there are faster ways still to do what you need to. On Mon, Jan 26, 2015 at 12:32 AM, Steve Nunez <snu...@hortonworks.com> wrote: > Spark Experts, > > I’ve got a list of points: List[(Float, Float)]) that represent (x,y) > coordinate pairs and need to sum the distance. It’s easy enough to compute > the distance: > > case class Point(x: Float, y: Float) { > def distance(other: Point): Float = > sqrt(pow(x - other.x, 2) + pow(y - other.y, 2)).toFloat > } > > (in this case I create a ‘Point’ class, but the maths are the same). > > What I can’t figure out is the ‘right’ way to sum distances between all the > points. I can make this work by traversing the list with a for loop and > using indices, but this doesn’t seem right. > > Anyone know a clever way to process List[(Float, Float)]) in a pairwise > fashion? > > Regards, > - Steve > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org