Re: Spark partitioning question

2015-05-05 Thread Marius Danciu
Turned out that is was sufficient do to repartitionAndSortWithinPartitions ... so far so good ;) On Tue, May 5, 2015 at 9:45 AM Marius Danciu marius.dan...@gmail.com wrote: Hi Imran, Yes that's what MyPartitioner does. I do see (using traces from MyPartitioner) that the key is partitioned on

Re: Spark partitioning question

2015-05-05 Thread Marius Danciu
Hi Imran, Yes that's what MyPartitioner does. I do see (using traces from MyPartitioner) that the key is partitioned on partition 0 but then I see this record arriving in both Yarn containers (I see it in the logs). Basically I need to emulate a Hadoop map-reduce job in Spark and groupByKey

Re: Spark partitioning question

2015-05-04 Thread Imran Rashid
Hi Marius, I am also a little confused -- are you saying that myPartitions is basically something like: class MyPartitioner extends Partitioner { def numPartitions = 1 def getPartition(key: Any) = 0 } ?? If so, I don't understand how you'd ever end up data in two partitions. Indeed, than

Re: Spark partitioning question

2015-04-28 Thread Marius Danciu
repartitionAndSortWithinPartitions to do it in one shot. Thanks, Silvio From: Marius Danciu Date: Tuesday, April 28, 2015 at 8:10 AM To: user Subject: Spark partitioning question Hello all, I have the following Spark (pseudo)code: rdd = mapPartitionsWithIndex

Spark partitioning question

2015-04-28 Thread Marius Danciu
Hello all, I have the following Spark (pseudo)code: rdd = mapPartitionsWithIndex(...) .mapPartitionsToPair(...) .groupByKey() .sortByKey(comparator) .partitionBy(myPartitioner) .mapPartitionsWithIndex(...) .mapPartitionsToPair( *f* ) The input

Re: Spark partitioning question

2015-04-28 Thread Silvio Fiorito
. From: Marius Danciu Date: Tuesday, April 28, 2015 at 9:53 AM To: Silvio Fiorito, user Subject: Re: Spark partitioning question Thank you Silvio, I am aware of groubBy limitations and this is subject for replacement. I did try repartitionAndSortWithinPartitions but then I end up with maybe too