subject:"Why these operations are slower than the equivalent on Hadoop\?"

Re: Why these operations are slower than the equivalent on Hadoop?

2014-04-16 Thread Eugen Cepoi

Yes, the second example does that. It transforms all the points of a partition into a single element the skyline, thus reduce will run on the skyline of two partitions and not on single points. Le 16 avr. 2014 06:47, Yanzhe Chen yanzhe...@gmail.com a écrit : Eugen, Thanks for your tip and I do

Why these operations are slower than the equivalent on Hadoop?

2014-04-15 Thread Yanzhe Chen

Hi all, As a previous thread, I am asking how to implement a divide-and-conquer algorithm (skyline) in Spark. Here is my current solution: val data = sc.textFile(…).map(line = line.split(“,”).map(_.toDouble)) val result = data.mapPartitions(points =

Re: Why these operations are slower than the equivalent on Hadoop?

2014-04-15 Thread Cheng Lian

Your Spark solution first reduces partial results into a single partition, computes the final result, and then collects to the driver side. This involves a shuffle and two waves of network traffic. Instead, you can directly collect partial results to the driver and then computes the final results

Re: Why these operations are slower than the equivalent on Hadoop?

2014-04-15 Thread Eugen Cepoi

It depends on your algorithm but I guess that you probably should use reduce (the code probably doesn't compile but it shows you the idea). val result = data.reduce { case (left, right) = skyline(left ++ right) } Or in the case you want to merge the result of a partition with another one you

Re: Why these operations are slower than the equivalent on Hadoop?

2014-04-15 Thread Yanzhe Chen

Eugen, Thanks for your tip and I do want to merge the result of a partition with another one but I am still not quite clear how to do it. Say the original data rdd has 32 partitions and since mapPartitions won’t change the number of partitions, it will remain 32 partitions which each contains

Re: Why these operations are slower than the equivalent on Hadoop?

Why these operations are slower than the equivalent on Hadoop?

Re: Why these operations are slower than the equivalent on Hadoop?

Re: Why these operations are slower than the equivalent on Hadoop?

Re: Why these operations are slower than the equivalent on Hadoop?

5 matches

Site Navigation

Mail list logo

Footer information