I may not be understanding your question - for a given date, you have many ID values - is that correct? Are there additional columns in this dataset that you aren't mentioning, or are we simply dealing with id and dt?
What structure do you need the return data to be in? If you're looking for a return dataframe with columns of id and dt, but you'd like it sorted so that for a given dt, the id's are arranged in order, then I would suggest something like this (I speak Python, so first example comes from the Python API doc): http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.orderBy df.orderBy(["dt", "id"], ascending=[1, 1]).show() # this will order the dataframe df by the dt column in ascending order (dates increasing), with matched dates ordered in ascending order by the id column. This may help from the Scala API: http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrame My apologies if I'm heading off in a direction you're not looking for. My tl:dr version is that you may only need sort - the groupBy is unnecessary. Asoka From: saif.a.ell...@wellsfargo.com [mailto:saif.a.ell...@wellsfargo.com] Sent: Friday, October 02, 2015 8:32 AM To: user@spark.apache.org Subject: from groupBy return a DataFrame without aggregation? Given ID, DATE, I need all sorted dates per ID, what is the easiest way? I got this but I don't like it: val l = zz.groupBy("id", " dt").agg($"dt".as("dummy")).sort($"dt".asc) Saif