Thanks Mayur - based on the doc-comments in source looks like this will
work for the case. I will confirm.

----
the dreamers of the day are dangerous men, for they may act their dream
with open eyes, and make it possible


On Fri, Mar 7, 2014 at 2:21 AM, Mayur Rustagi <mayur.rust...@gmail.com>wrote:

> How about PartitionerAwareUnionRDD?
>
> Regards
> Mayur
>
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
> On Thu, Mar 6, 2014 at 9:42 AM, Evan Chan <e...@ooyala.com> wrote:
>
> > I would love to hear the answer to this as well.
> >
> > On Thu, Mar 6, 2014 at 4:09 AM, Manoj Awasthi <awasthi.ma...@gmail.com>
> > wrote:
> > > Hi All,
> > >
> > >
> > > I have a three machine cluster. I have two RDDs each consisting of
> (K,V)
> > > pairs. RDDs have just three keys 'a', 'b' and 'c'.
> > >
> > >     // list1 - List(('a',1), ('b',2), ....
> > >     val rdd1 = sc.parallelize(list1).groupByKey(new HashPartitioner(3))
> > >
> > >     // list2 - List(('a',2), ('b',7), ....
> > >     val rdd2 = sc.parallelize(list2).groupByKey(new HashPartitioner(3))
> > >
> > > By using a HashPartitioner with 3 partitions I can achieve that each of
> > the
> > > keys ('a', 'b' and 'c') in each RDD gets partitioned on different
> > machines
> > > on cluster (based on the hashCode).
> > >
> > > Problem is that I cannot deterministically do the same allocation for
> > > second RDD? (all 'a's from rdd2 going to the same machine where 'a's
> from
> > > first RDD went to).
> > >
> > > Is there a way to achieve this?
> > >
> > > Manoj
> >
> >
> >
> > --
> > --
> > Evan Chan
> > Staff Engineer
> > e...@ooyala.com  |
> >
>

Reply via email to