yes..I think I figured out something like below Serialized Java Class ----------------- public class MyMapPartition implements Serializable,MapPartitionsFunction{ @Override public Iterator call(Iterator iter) throws Exception { ArrayList<Row> list = new ArrayList<Row>(); // ArrayNode array = mapper.createArrayNode(); Row row=null; System.out.println("--------"); while(iter.hasNext()){
row=(Row) iter.next(); System.out.println(row); list.add(row); } System.out.println(">>>>"); return list.iterator(); } } Unit Test ----------- JavaRDD<Row> rdd = jsc.parallelize(Arrays.asList(RowFactory.create(11L,21L,1L) ,RowFactory.create(11L,22L,2L) ,RowFactory.create(11L,22L,1L) ,RowFactory.create(12L,23L,3L) ,RowFactory.create(12L,24L,3L) ,RowFactory.create(12L,22L,4L) ,RowFactory.create(13L,22L,4L) ,RowFactory.create(14L,22L,4L) )); StructType structType = new StructType(); structType = structType.add("a", DataTypes.LongType, false) .add("b", DataTypes.LongType, false) .add("c", DataTypes.LongType, false); ExpressionEncoder<Row> encoder = RowEncoder.apply(structType); Dataset<Row> ds = spark.createDataFrame(rdd, encoder.schema()); ds.show(); MyMapPartition mp = new MyMapPartition (); //Iterator<Row> //.repartition(new Column("a"),new Column("b")) Dataset<Row> grouped = ds.groupBy("a", "b","c") .count() .repartition(new Column("a"),new Column("b")) .mapPartitions(mp,encoder); grouped.count(); --------------- output -------- -------- [12,23,3,1] >>>> -------- [14,22,4,1] >>>> -------- [12,24,3,1] >>>> -------- [12,22,4,1] >>>> -------- [11,22,1,1] [11,22,2,1] >>>> -------- [11,21,1,1] >>>> -------- [13,22,4,1] >>>> On Wed, Oct 18, 2017 at 10:29 AM, ayan guha <guha.a...@gmail.com> wrote: > How or what you want to achieve? Ie are planning to do some aggregation on > group by c1,c2? > > On Wed, 18 Oct 2017 at 4:13 pm, Imran Rajjad <raj...@gmail.com> wrote: > >> Hi, >> >> I have a set of rows that are a result of a groupBy(col1,col2,col3).count( >> ). >> >> Is it possible to map rows belong to unique combination inside an >> iterator? >> >> e.g >> >> col1 col2 col3 >> a 1 a1 >> a 1 a2 >> b 2 b1 >> b 2 b2 >> >> how can I separate rows with col1 and col2 = (a,1) and (b,2)? >> >> regards, >> Imran >> >> -- >> I.R >> > -- > Best Regards, > Ayan Guha > -- I.R