So you’ve got a point A and you want the sum of distances between it and all other points? Or am I misunderstanding you?
// target point, can be Broadcast global sent to all workers val tarPt = (10,20) val pts = Seq((2,2),(3,3),(2,3),(10,2)) val rdd= sc.parallelize(pts) rdd.map( pt => Math.sqrt( Math.pow(tarPt._1 - pt._1,2) + Math.pow(tarPt._2 - pt._2,2)) ).reduce( (d1,d2) => d1+d2) -Joe From: Steve Nunez <snu...@hortonworks.com<mailto:snu...@hortonworks.com>> Date: Sunday, January 25, 2015 at 7:32 PM To: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Pairwise Processing of a List Spark Experts, I’ve got a list of points: List[(Float, Float)]) that represent (x,y) coordinate pairs and need to sum the distance. It’s easy enough to compute the distance: case class Point(x: Float, y: Float) { def distance(other: Point): Float = sqrt(pow(x - other.x, 2) + pow(y - other.y, 2)).toFloat } (in this case I create a ‘Point’ class, but the maths are the same). What I can’t figure out is the ‘right’ way to sum distances between all the points. I can make this work by traversing the list with a for loop and using indices, but this doesn’t seem right. Anyone know a clever way to process List[(Float, Float)]) in a pairwise fashion? Regards, - Steve