Groupby is not an actual result but a construct to allow defining aggregations.

So you can do:


                    import org.apache.spark.sql.{functions => func}

         val resDF = df.groupBy("client").agg(func.collect_set(df("Date")))


Note that collect_set can be a little heavy in terms of performance so if you 
just want to count, you should probably use approxCountDistinct
Assaf.

From: Devi P.V [mailto:devip2...@gmail.com]
Sent: Thursday, December 08, 2016 10:38 AM
To: user @spark
Subject: How to find unique values after groupBy() in spark dataframe ?

Hi all,

I have a dataframe like following,
+---------+----------+
|client_id|Date      |
+-------- +----------+
| a       |2016-11-23|
| b       |2016-11-18|
| a       |2016-11-23|
| a       |2016-11-23|
| a       |2016-11-24|
+---------+----------+
I want to find unique dates of each client_id using spark dataframe.
expected output

a  (2016-11-23, 2016-11-24)
b   2016-11-18
I tried with df.groupBy("client_id").But I don't know how to find distinct 
values after groupBy().
How to do this?
Is any other efficient methods are available for doing this ?
I am using scala 2.11.8 & spark 2.0

Thanks

Reply via email to