The following resources should be useful: https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-windows.html
The last link should have the exact solution 2016-07-06 16:55 GMT+02:00 Tal Grynbaum <tal.grynb...@gmail.com>: > You can use rank window function to rank each row in the group, and then > filter the rowz with rank < 50 > > On Wed, Jul 6, 2016, 14:07 <luohui20...@sina.com> wrote: > >> hi there >> I have a DF with 3 columns: id , pv, location.(the rows are already >> grouped by location and sort by pv in des) I wanna get the first 50 id >> values grouped by location. I checked the API of >> dataframe,groupeddata,pairRDD, and found no match. >> is there a way to do this naturally? >> any info will be appreciated. >> >> >> >> -------------------------------- >> >> Thanks&Best regards! >> San.Luo >> >