AFAIK ordering is not strictly guaranteed unless the RDD is the
product of a sort. I think that in practice, you'll never find
elements of a file read in some random order, for example (although
see the recent issue about partition ordering potentially depending on
how the local file system lists
@spark.apache.org
Subject: Pairwise Processing of a List
Spark Experts,
I’ve got a list of points: List[(Float, Float)]) that represent (x,y)
coordinate pairs and need to sum the distance. It’s easy enough to compute the
distance:
case class Point(x: Float, y: Float) {
def distance(other: Point
Hi,
On Mon, Jan 26, 2015 at 9:32 AM, Steve Nunez snu...@hortonworks.com wrote:
I’ve got a list of points: List[(Float, Float)]) that represent (x,y)
coordinate pairs and need to sum the distance. It’s easy enough to compute
the distance:
Are you saying you want all combinations (N^2) of
If this is really about just Scala Lists, then a simple answer (using
tuples of doubles) is:
val points: List[(Double,Double)] = ...
val distances = for (p1 - points; p2 - points) yield {
val dx = p1._1 - p2._1
val dy = p1._2 - p2._2
math.sqrt(dx*dx + dy*dy)
}
distances.sum / 2
It's / 2
...@mc10inc.commailto:jl...@mc10inc.com
Date: Sunday, January 25, 2015 at 17:17
To: Steve Nunez snu...@hortonworks.commailto:snu...@hortonworks.com,
user@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: Pairwise Processing of a List
So you've got a point
Spark Experts,
I've got a list of points: List[(Float, Float)]) that represent (x,y)
coordinate pairs and need to sum the distance. It's easy enough to compute the
distance:
case class Point(x: Float, y: Float) {
def distance(other: Point): Float =
sqrt(pow(x - other.x, 2) + pow(y -
@spark.apache.org
Subject: Re: Pairwise Processing of a List
So you’ve got a point A and you want the sum of distances between it and all
other points? Or am I misunderstanding you?
// target point, can be Broadcast global sent to all workers
val tarPt = (10,20)
val pts = Seq((2,2),(3,3),(2,3
Sean,
On Mon, Jan 26, 2015 at 10:28 AM, Sean Owen so...@cloudera.com wrote:
Note that RDDs don't really guarantee anything about ordering though,
so this only makes sense if you've already sorted some upstream RDD by
a timestamp or sequence number.
Speaking of order, is there some reading