Re: parition by multiple columns/keys

Imran Rajjad Tue, 17 Oct 2017 23:01:46 -0700

yes..I think I figured out something like below

Serialized Java Class
-----------------
public class MyMapPartition implements Serializable,MapPartitionsFunction{
 @Override
 public Iterator call(Iterator iter) throws Exception {
  ArrayList<Row> list = new ArrayList<Row>();
  // ArrayNode array = mapper.createArrayNode();
  Row row=null;
  System.out.println("--------");
  while(iter.hasNext()){


   row=(Row) iter.next();
   System.out.println(row);
   list.add(row);
  }
  System.out.println(">>>>");
  return list.iterator();
 }
}

Unit Test
-----------
JavaRDD<Row> rdd =
jsc.parallelize(Arrays.asList(RowFactory.create(11L,21L,1L)
              ,RowFactory.create(11L,22L,2L)
              ,RowFactory.create(11L,22L,1L)
              ,RowFactory.create(12L,23L,3L)
              ,RowFactory.create(12L,24L,3L)
              ,RowFactory.create(12L,22L,4L)
              ,RowFactory.create(13L,22L,4L)
              ,RowFactory.create(14L,22L,4L)
    ));
  StructType structType = new StructType();
  structType = structType.add("a", DataTypes.LongType, false)
        .add("b", DataTypes.LongType, false)
        .add("c", DataTypes.LongType, false);
  ExpressionEncoder<Row> encoder = RowEncoder.apply(structType);


  Dataset<Row> ds = spark.createDataFrame(rdd, encoder.schema());
  ds.show();

  MyMapPartition mp = new MyMapPartition ();
//Iterator<Row>
  //.repartition(new Column("a"),new Column("b"))
   Dataset<Row> grouped = ds.groupBy("a", "b","c")
    .count()
    .repartition(new Column("a"),new Column("b"))
    .mapPartitions(mp,encoder);

  grouped.count();

---------------

output
--------
--------
[12,23,3,1]
>>>>
--------
[14,22,4,1]
>>>>
--------
[12,24,3,1]
>>>>
--------
[12,22,4,1]
>>>>
--------
[11,22,1,1]
[11,22,2,1]
>>>>
--------
[11,21,1,1]
>>>>
--------
[13,22,4,1]
>>>>


On Wed, Oct 18, 2017 at 10:29 AM, ayan guha <guha.a...@gmail.com> wrote:

> How or what you want to achieve? Ie are planning to do some aggregation on
> group by c1,c2?
>
> On Wed, 18 Oct 2017 at 4:13 pm, Imran Rajjad <raj...@gmail.com> wrote:
>
>> Hi,
>>
>> I have a set of rows that are a result of a groupBy(col1,col2,col3).count(
>> ).
>>
>> Is it possible to map rows belong to unique combination inside an
>> iterator?
>>
>> e.g
>>
>> col1 col2 col3
>> a      1      a1
>> a      1      a2
>> b      2      b1
>> b      2      b2
>>
>> how can I separate rows with col1 and col2 = (a,1) and (b,2)?
>>
>> regards,
>> Imran
>>
>> --
>> I.R
>>
> --
> Best Regards,
> Ayan Guha
>



-- 
I.R

Re: parition by multiple columns/keys

Reply via email to