So you’ve got a point A and you want the sum of distances between it and all 
other points? Or am I misunderstanding you?

// target point, can be Broadcast global sent to all workers
val tarPt = (10,20)
val pts = Seq((2,2),(3,3),(2,3),(10,2))
val rdd= sc.parallelize(pts)
rdd.map( pt => Math.sqrt( Math.pow(tarPt._1 - pt._1,2) + Math.pow(tarPt._2 - 
pt._2,2)) ).reduce( (d1,d2) => d1+d2)

-Joe

From: Steve Nunez <snu...@hortonworks.com<mailto:snu...@hortonworks.com>>
Date: Sunday, January 25, 2015 at 7:32 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Pairwise Processing of a List

Spark Experts,

I’ve got a list of points: List[(Float, Float)]) that represent (x,y) 
coordinate pairs and need to sum the distance. It’s easy enough to compute the 
distance:

case class Point(x: Float, y: Float) {
  def distance(other: Point): Float =
    sqrt(pow(x - other.x, 2) + pow(y - other.y, 2)).toFloat
}

(in this case I create a ‘Point’ class, but the maths are the same).

What I can’t figure out is the ‘right’ way to sum distances between all the 
points. I can make this work by traversing the list with a for loop and using 
indices, but this doesn’t seem right.

Anyone know a clever way to process List[(Float, Float)]) in a pairwise fashion?

Regards,
- Steve


Reply via email to