Re: Evenly balance the number of items in each RDD partition

2016-05-10 Thread Ayman Khalil
s-of-equally-sized-partition-where > > It recommends defining a custom partitioner and (PairRDD) partitionBy > method to accomplish this. > > Xinh > > On Tue, May 10, 2016 at 1:15 PM, Ayman Khalil <aymkhali...@gmail.com> > wrote: > >> And btw, I'm using the P

Re: Evenly balance the number of items in each RDD partition

2016-05-10 Thread Ayman Khalil
And btw, I'm using the Python API if this makes any difference. On Tue, May 10, 2016 at 11:14 PM, Ayman Khalil <aymkhali...@gmail.com> wrote: > Hi Don, > > This didn't help. My original rdd is already created using 10 partitions. > As a matter of fact, after trying with rdd.co

Re: Evenly balance the number of items in each RDD partition

2016-05-10 Thread Ayman Khalil
nsive operation depending on your RDD size. > > -Don > > On Tue, May 10, 2016 at 12:38 PM, Ayman Khalil <aymkhali...@gmail.com> > wrote: > >> Hello, >> >> I have 50,000 items parallelized into an RDD with 10 partitions, I would >> like to evenly split the

Evenly balance the number of items in each RDD partition

2016-05-10 Thread Ayman Khalil
Hello, I have 50,000 items parallelized into an RDD with 10 partitions, I would like to evenly split the items over the partitions so: 50,000/10 = 5,000 in each RDD partition. What I get instead is the following (partition index, partition count): [(0, 4096), (1, 5120), (2, 5120), (3, 5120), (4,

Locality aware tree reduction

2016-05-07 Thread Ayman Khalil
Hello, Is there a way to instruct treeReduce() to reduce RDD partitions on the same node locally? In my case, I'm using treeReduce() to reduce map results in parallel. My reduce function is just arithmetically adding map results (i.e. no notion of aggregation by key). As far as I understand, a