subject:"why do RDD's partitions migrate between worker nodes in different iterations"

Re: why do RDD's partitions migrate between worker nodes in different iterations

2014-10-17 Thread Sean Owen

The RDDs aren't changing; you are assigning new RDDs to rdd_0 and
rdd_1. Operations like join and reduceByKey are making distinct, new
partitions that don't correspond 1-1 with old partitions anyway.

On Fri, Oct 17, 2014 at 5:32 AM, randylu  wrote:
> Dear all,
>   In my test programer, there are 3 partitions for each RDD, the iteration
> procedure is as follows:
> var rdd_0 = ...  // init
> for (...) {
>   *rdd_1* = *rdd_0*.reduceByKey(...).partitionBy(p) // calculate rdd_1
> from rdd_0
>   *rdd_0* = *rdd_0*.partitionBy(p).join(*rdd_1*)... // update rdd_0
> by rdd_1
>   *rdd_0*./action/()
> }
>   I thought rdd_0 and rdd_1 are part by the same partitioner, and their
> corresponding partitions are on the same node. for example, rdd_0's
> partition_0 and rdd_1's partiiton_0 are on the same node in each iteration.
> But in fact, rdd_0's partition_0 changes its location between workers.
>   Any way to make rdd_0 and rdd_1's partitions not changing their locations,
> and their corresponding partitions are on the same node for fast join() ?
>   Best Regards,
>   randy
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/why-do-RDD-s-partitions-migrate-between-worker-nodes-in-different-iterations-tp16669.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

why do RDD's partitions migrate between worker nodes in different iterations

2014-10-17 Thread randylu

Dear all,
  In my test programer, there are 3 partitions for each RDD, the iteration
procedure is as follows:
var rdd_0 = ...  // init
for (...) {
  *rdd_1* = *rdd_0*.reduceByKey(...).partitionBy(p) // calculate rdd_1
from rdd_0
  *rdd_0* = *rdd_0*.partitionBy(p).join(*rdd_1*)... // update rdd_0
by rdd_1
  *rdd_0*./action/()
}
  I thought rdd_0 and rdd_1 are part by the same partitioner, and their
corresponding partitions are on the same node. for example, rdd_0's
partition_0 and rdd_1's partiiton_0 are on the same node in each iteration. 
But in fact, rdd_0's partition_0 changes its location between workers.
  Any way to make rdd_0 and rdd_1's partitions not changing their locations,
and their corresponding partitions are on the same node for fast join() ?
  Best Regards,
  randy



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/why-do-RDD-s-partitions-migrate-between-worker-nodes-in-different-iterations-tp16669.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: why do RDD's partitions migrate between worker nodes in different iterations

why do RDD's partitions migrate between worker nodes in different iterations

2 matches

Site Navigation

Mail list logo

Footer information