Yes, the second example does that. It transforms all the points of a
partition into a single element the skyline, thus reduce will run on the
skyline of two partitions and not on single points.
Le 16 avr. 2014 06:47, Yanzhe Chen yanzhe...@gmail.com a écrit :
Eugen,
Thanks for your tip and I do
Hi all,
As a previous thread, I am asking how to implement a divide-and-conquer
algorithm (skyline) in Spark.
Here is my current solution:
val data = sc.textFile(…).map(line = line.split(“,”).map(_.toDouble))
val result = data.mapPartitions(points =
Your Spark solution first reduces partial results into a single partition,
computes the final result, and then collects to the driver side. This
involves a shuffle and two waves of network traffic. Instead, you can
directly collect partial results to the driver and then computes the final
results
It depends on your algorithm but I guess that you probably should use
reduce (the code probably doesn't compile but it shows you the idea).
val result = data.reduce { case (left, right) =
skyline(left ++ right)
}
Or in the case you want to merge the result of a partition with another one
you
Eugen,
Thanks for your tip and I do want to merge the result of a partition with
another one but I am still not quite clear how to do it.
Say the original data rdd has 32 partitions and since mapPartitions won’t
change the number of partitions, it will remain 32 partitions which each
contains