Re: Collecting Multiple Aggregation query result on one Column as collectAsMap

2017-08-29 Thread Georg Heiler
What about a custom UADF? Patrick schrieb am Mo. 28. Aug. 2017 um 20:54: > ok . i see there is a describe() function which does the stat calculation > on dataset similar to StatCounter but however i dont want to restrict my > aggregations to standard mean, stddev etc and

Re: Collecting Multiple Aggregation query result on one Column as collectAsMap

2017-08-28 Thread Patrick
ok . i see there is a describe() function which does the stat calculation on dataset similar to StatCounter but however i dont want to restrict my aggregations to standard mean, stddev etc and generate some custom stats , or also may not run all the predefined stats but only subset of them on the

Re: Collecting Multiple Aggregation query result on one Column as collectAsMap

2017-08-28 Thread Vadim Semenov
I didn't tailor it to your needs, but this is what I can offer you, the idea should be pretty clear import org.apache.spark.sql.SparkSession import org.apache.spark.sql.functions.{collect_list, struct} val spark: SparkSession import spark.implicits._ case class Input( a: Int, b: Long, c:

Re: Collecting Multiple Aggregation query result on one Column as collectAsMap

2017-08-28 Thread Georg Heiler
Rdd only Patrick schrieb am Mo. 28. Aug. 2017 um 20:13: > Ah, does it work with Dataset API or i need to convert it to RDD first ? > > On Mon, Aug 28, 2017 at 10:40 PM, Georg Heiler > wrote: > >> What about the rdd stat counter? >>

Re: Collecting Multiple Aggregation query result on one Column as collectAsMap

2017-08-28 Thread Patrick
Ah, does it work with Dataset API or i need to convert it to RDD first ? On Mon, Aug 28, 2017 at 10:40 PM, Georg Heiler wrote: > What about the rdd stat counter? https://spark.apache.org/docs/ > 0.6.2/api/core/spark/util/StatCounter.html > > Patrick

Re: Collecting Multiple Aggregation query result on one Column as collectAsMap

2017-08-28 Thread Georg Heiler
What about the rdd stat counter? https://spark.apache.org/docs/0.6.2/api/core/spark/util/StatCounter.html Patrick schrieb am Mo. 28. Aug. 2017 um 16:47: > Hi > > I have two lists: > > >- List one: contains names of columns on which I want to do aggregate >

Collecting Multiple Aggregation query result on one Column as collectAsMap

2017-08-28 Thread Patrick
Hi I have two lists: - List one: contains names of columns on which I want to do aggregate operations. - List two: contains the aggregate operations on which I want to perform on each column eg ( min, max, mean) I am trying to use spark 2.0 dataset to achieve this. Spark provides