RE: from groupBy return a DataFrame without aggregation?

Diggs, Asoka Fri, 02 Oct 2015 09:50:44 -0700

I may not be understanding your question - for a given date, you have many ID 
values - is that correct?  Are there additional columns in this dataset that 
you aren't mentioning, or are we simply dealing with id and dt?


What structure do you need the return data to be in?


If you're looking for a return dataframe with columns of id and dt, but you'd 
like it sorted so that for a given dt, the id's are arranged in order, then I 
would suggest something like this (I speak Python, so first example comes from 
the Python API doc):

http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrame.orderBy
df.orderBy(["dt", "id"], ascending=[1, 1]).show()
# this will order the dataframe df by the dt column in ascending order (dates 
increasing), with matched dates ordered in ascending order by the id column.

This may help from the Scala API:
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrame


My apologies if I'm heading off in a direction you're not looking for. My tl:dr 
version is that you may only need sort - the groupBy is unnecessary.

Asoka

From: saif.a.ell...@wellsfargo.com [mailto:saif.a.ell...@wellsfargo.com]
Sent: Friday, October 02, 2015 8:32 AM
To: user@spark.apache.org
Subject: from groupBy return a DataFrame without aggregation?

Given ID, DATE, I need all sorted dates per ID, what is the easiest way?

I got this but I don't like it:
val l = zz.groupBy("id", " dt").agg($"dt".as("dummy")).sort($"dt".asc)

Saif

RE: from groupBy return a DataFrame without aggregation?

Reply via email to