median of groups

Peter Figliozzi Mon, 26 Sep 2016 17:54:07 -0700

I'm trying to figure out a nice way to get the median of a DataFrame
column *once
it is grouped.  *


It's easy enough now to get the min, max, mean, and other things that are
part of spark.sql.functions:

df.groupBy("foo", "bar").agg(mean($"column1"))

And it's easy enough to get the median of a column before grouping, using
approxQuantile.

However approxQuantile is part of DataFrame.stat i.e. a
DataFrameStatFunctions.

Is there a way to use it inside the .agg?

Or do we need a user defined aggregation function?

Or some other way?
Stack Overflow version of the question here
<http://stackoverflow.com/questions/39693730/median-of-groups-in-a-dataframe-spark-2-0>
.

Thanks,

Pete

median of groups

Reply via email to