Re: Collecting Multiple Aggregation query result on one Column as collectAsMap

Patrick Mon, 28 Aug 2017 11:14:10 -0700

Ah, does it work with Dataset API or i need to convert it to RDD first ?

On Mon, Aug 28, 2017 at 10:40 PM, Georg Heiler <georg.kf.hei...@gmail.com>
wrote:


> What about the rdd stat counter? https://spark.apache.org/docs/
> 0.6.2/api/core/spark/util/StatCounter.html
>
> Patrick <titlibat...@gmail.com> schrieb am Mo. 28. Aug. 2017 um 16:47:
>
>> Hi
>>
>> I have two lists:
>>
>>
>>    - List one: contains names of columns on which I want to do aggregate
>>    operations.
>>    - List two: contains the aggregate operations on which I want to
>>    perform on each column eg ( min, max, mean)
>>
>> I am trying to use spark 2.0 dataset to achieve this. Spark provides an
>> agg() where you can pass a Map <String,String> (of column name and
>> respective aggregate operation ) as input, however I want to perform
>> different aggregation operations on the same column of the data and want to
>> collect the result in a Map<String,String> where key is the aggregate
>> operation and Value is the result on the particular column.  If i add
>> different agg() to same column, the key gets updated with latest value.
>>
>> Also I dont find any collectAsMap() operation that returns map of
>> aggregated column name as key and result as value. I get collectAsList()
>> but i dont know the order in which those agg() operations are run so how do
>> i match which list values corresponds to which agg operation.  I am able to
>> see the result using .show() but How can i collect the result in this case ?
>>
>> Is it possible to do different aggregation on the same column in one
>> Job(i.e only one collect operation) using agg() operation?
>>
>>
>> Thanks in advance.
>>
>>

Re: Collecting Multiple Aggregation query result on one Column as collectAsMap

Reply via email to