Ah, does it work with Dataset API or i need to convert it to RDD first ? On Mon, Aug 28, 2017 at 10:40 PM, Georg Heiler <georg.kf.hei...@gmail.com> wrote:
> What about the rdd stat counter? https://spark.apache.org/docs/ > 0.6.2/api/core/spark/util/StatCounter.html > > Patrick <titlibat...@gmail.com> schrieb am Mo. 28. Aug. 2017 um 16:47: > >> Hi >> >> I have two lists: >> >> >> - List one: contains names of columns on which I want to do aggregate >> operations. >> - List two: contains the aggregate operations on which I want to >> perform on each column eg ( min, max, mean) >> >> I am trying to use spark 2.0 dataset to achieve this. Spark provides an >> agg() where you can pass a Map <String,String> (of column name and >> respective aggregate operation ) as input, however I want to perform >> different aggregation operations on the same column of the data and want to >> collect the result in a Map<String,String> where key is the aggregate >> operation and Value is the result on the particular column. If i add >> different agg() to same column, the key gets updated with latest value. >> >> Also I dont find any collectAsMap() operation that returns map of >> aggregated column name as key and result as value. I get collectAsList() >> but i dont know the order in which those agg() operations are run so how do >> i match which list values corresponds to which agg operation. I am able to >> see the result using .show() but How can i collect the result in this case ? >> >> Is it possible to do different aggregation on the same column in one >> Job(i.e only one collect operation) using agg() operation? >> >> >> Thanks in advance. >> >>