Re: Pairwise Processing of a List

2015-01-26 Thread Sean Owen
AFAIK ordering is not strictly guaranteed unless the RDD is the product of a sort. I think that in practice, you'll never find elements of a file read in some random order, for example (although see the recent issue about partition ordering potentially depending on how the local file system lists

Re: Pairwise Processing of a List

2015-01-25 Thread Joseph Lust
@spark.apache.org Subject: Pairwise Processing of a List Spark Experts, I’ve got a list of points: List[(Float, Float)]) that represent (x,y) coordinate pairs and need to sum the distance. It’s easy enough to compute the distance: case class Point(x: Float, y: Float) { def distance(other: Point

Re: Pairwise Processing of a List

2015-01-25 Thread Tobias Pfeiffer
Hi, On Mon, Jan 26, 2015 at 9:32 AM, Steve Nunez snu...@hortonworks.com wrote: I’ve got a list of points: List[(Float, Float)]) that represent (x,y) coordinate pairs and need to sum the distance. It’s easy enough to compute the distance: Are you saying you want all combinations (N^2) of

Re: Pairwise Processing of a List

2015-01-25 Thread Sean Owen
If this is really about just Scala Lists, then a simple answer (using tuples of doubles) is: val points: List[(Double,Double)] = ... val distances = for (p1 - points; p2 - points) yield { val dx = p1._1 - p2._1 val dy = p1._2 - p2._2 math.sqrt(dx*dx + dy*dy) } distances.sum / 2 It's / 2

Re: Pairwise Processing of a List

2015-01-25 Thread Steve Nunez
...@mc10inc.commailto:jl...@mc10inc.com Date: Sunday, January 25, 2015 at 17:17 To: Steve Nunez snu...@hortonworks.commailto:snu...@hortonworks.com, user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: Pairwise Processing of a List So you've got a point

Pairwise Processing of a List

2015-01-25 Thread Steve Nunez
Spark Experts, I've got a list of points: List[(Float, Float)]) that represent (x,y) coordinate pairs and need to sum the distance. It's easy enough to compute the distance: case class Point(x: Float, y: Float) { def distance(other: Point): Float = sqrt(pow(x - other.x, 2) + pow(y -

Re: Pairwise Processing of a List

2015-01-25 Thread Sean Owen
@spark.apache.org Subject: Re: Pairwise Processing of a List So you’ve got a point A and you want the sum of distances between it and all other points? Or am I misunderstanding you? // target point, can be Broadcast global sent to all workers val tarPt = (10,20) val pts = Seq((2,2),(3,3),(2,3

Re: Pairwise Processing of a List

2015-01-25 Thread Tobias Pfeiffer
Sean, On Mon, Jan 26, 2015 at 10:28 AM, Sean Owen so...@cloudera.com wrote: Note that RDDs don't really guarantee anything about ordering though, so this only makes sense if you've already sorted some upstream RDD by a timestamp or sequence number. Speaking of order, is there some reading