Collecting Multiple Aggregation query result on one Column as collectAsMap

Patrick Mon, 28 Aug 2017 07:47:21 -0700

Hi

I have two lists:



   - List one: contains names of columns on which I want to do aggregate
   operations.
   - List two: contains the aggregate operations on which I want to perform
   on each column eg ( min, max, mean)

I am trying to use spark 2.0 dataset to achieve this. Spark provides an
agg() where you can pass a Map <String,String> (of column name and
respective aggregate operation ) as input, however I want to perform
different aggregation operations on the same column of the data and want to
collect the result in a Map<String,String> where key is the aggregate
operation and Value is the result on the particular column.  If i add
different agg() to same column, the key gets updated with latest value.

Also I dont find any collectAsMap() operation that returns map of
aggregated column name as key and result as value. I get collectAsList()
but i dont know the order in which those agg() operations are run so how do
i match which list values corresponds to which agg operation.  I am able to
see the result using .show() but How can i collect the result in this case ?

Is it possible to do different aggregation on the same column in one
Job(i.e only one collect operation) using agg() operation?


Thanks in advance.

Collecting Multiple Aggregation query result on one Column as collectAsMap

Reply via email to