I'm trying to figure out a nice way to get the median of a DataFrame column *once it is grouped. *
It's easy enough now to get the min, max, mean, and other things that are part of spark.sql.functions: df.groupBy("foo", "bar").agg(mean($"column1")) And it's easy enough to get the median of a column before grouping, using approxQuantile. However approxQuantile is part of DataFrame.stat i.e. a DataFrameStatFunctions. Is there a way to use it inside the .agg? Or do we need a user defined aggregation function? Or some other way? Stack Overflow version of the question here <http://stackoverflow.com/questions/39693730/median-of-groups-in-a-dataframe-spark-2-0> . Thanks, Pete