Re: how to select first 50 value of each group after group by?

Anton Okolnychyi Wed, 06 Jul 2016 08:23:00 -0700

The following resources should be useful:

https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-windows.html


The last link should have the exact solution

2016-07-06 16:55 GMT+02:00 Tal Grynbaum <tal.grynb...@gmail.com>:

> You can use rank window function to rank each row in the group,  and then
> filter the rowz with rank < 50
>
> On Wed, Jul 6, 2016, 14:07 <luohui20...@sina.com> wrote:
>
>> hi there
>> I have a DF with 3 columns: id , pv, location.(the rows are already
>> grouped by location and sort by pv in des)  I wanna get the first 50 id
>> values grouped by location. I checked the API of
>> dataframe,groupeddata,pairRDD, and found no match.
>>       is there a way to do this naturally?
>>       any info will be appreciated.
>>
>>
>>
>> --------------------------------
>>
>> Thanks&amp;Best regards!
>> San.Luo
>>
>

Re: how to select first 50 value of each group after group by?

Reply via email to