:罗辉 <luohui20...@sina.com>, user <user@spark.apache.org>
主题:Re: Re: how to select first 50 value of each group after group by?
日期:2016年07月07日 20点38分
Hi,
I can try to guess what is wrong, but I might be incorrect.
You should be careful with window frames (you define them via the
PI I want , because there
> are some values have same pv are ranked as same values. And first 50 rows
> of each frame is what I'm expecting. the attached file shows what I got by
> using rank.
> Thank you anyway, I learnt what rank could provide from your advice.
>
>
vide from your advice.
>
> ----
>
> ThanksBest regards!
> San.Luo
>
> - 原始邮件 -
> 发件人:Anton Okolnychyi <anton.okolnyc...@gmail.com>
> 收件人:user <user@spark.apache.org>
> 抄送人:luohui20...@sina.com
> 主题:Re: how to select firs
advice.
ThanksBest regards!
San.Luo
- 原始邮件 -
发件人:Anton Okolnychyi <anton.okolnyc...@gmail.com>
收件人:user <user@spark.apache.org>
抄送人:luohui20...@sina.com
主题:Re: how to select first 50 value of each group after group by?
日期:2016年07月06日 23
The following resources should be useful:
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-sql-windows.html
The last link should have the exact solution
2016-07-06 16:55 GMT+02:00 Tal
You can use rank window function to rank each row in the group, and then
filter the rowz with rank < 50
On Wed, Jul 6, 2016, 14:07 wrote:
> hi there
> I have a DF with 3 columns: id , pv, location.(the rows are already
> grouped by location and sort by pv in des) I wanna
hi thereI have a DF with 3 columns: id , pv, location.(the rows are already
grouped by location and sort by pv in des) I wanna get the first 50 id values
grouped by location. I checked the API of dataframe,groupeddata,pairRDD, and
found no match. is there a way to do this naturally?