Re: parition by multiple columns/keys

2017-10-23 Thread Imran Rajjad
strangely this is working only for very small dataset of rows.. for very large datasets apparently the partitioning is not working. is there a limit to the number of columns or rows when repartitioning according to multiple columns? regards, Imran On Wed, Oct 18, 2017 at 11:00 AM, Imran Rajjad

Re: parition by multiple columns/keys

2017-10-18 Thread Imran Rajjad
yes..I think I figured out something like below Serialized Java Class - public class MyMapPartition implements Serializable,MapPartitionsFunction{ @Override public Iterator call(Iterator iter) throws Exception { ArrayList list = new ArrayList(); // ArrayNode array =

Re: parition by multiple columns/keys

2017-10-17 Thread ayan guha
How or what you want to achieve? Ie are planning to do some aggregation on group by c1,c2? On Wed, 18 Oct 2017 at 4:13 pm, Imran Rajjad wrote: > Hi, > > I have a set of rows that are a result of a > groupBy(col1,col2,col3).count(). > > Is it possible to map rows belong to

parition by multiple columns/keys

2017-10-17 Thread Imran Rajjad
Hi, I have a set of rows that are a result of a groupBy(col1,col2,col3).count(). Is it possible to map rows belong to unique combination inside an iterator? e.g col1 col2 col3 a 1 a1 a 1 a2 b 2 b1 b 2 b2 how can I separate rows with col1 and col2 =