Did you take a look at the excellent write up by Yin Huai and Michael
Armbrust? It appears that rank is supported in the 1.4.x release.
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
Snippet from above article for your convenience:
To answer the first question “*What are the best-selling and the second
best-selling products in every category?*”, we need to rank products in a
category based on their revenue, and to pick the best selling and the
second best-selling products based the ranking. Below is the SQL query used
to answer this question by using window function dense_rank (we will
explain the syntax of using window functions in next section).
SELECT
product,
category,
revenueFROM (
SELECT
product,
category,
revenue,
dense_rank() OVER (PARTITION BY category ORDER BY revenue DESC) as rank
FROM productRevenue) tmpWHERE
rank = 2
The result of this query is shown below. Without using window functions, it
is very hard to express the query in SQL, and even if a SQL query can be
expressed, it is hard for the underlying engine to efficiently evaluate the
query.
[image: 1-2]
SQLDataFrame APIRanking functionsrankrankdense_rankdenseRankpercent_rank
percentRankntilentilerow_numberrowNumber
HTH.
-Todd
On Thu, Jul 16, 2015 at 8:10 AM, Lior Chaga lio...@taboola.com wrote:
Does spark HiveContext support the rank() ... distribute by syntax (as in
the following article-
http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/doing_rank_with_hive
)?
If not, how can it be achieved?
Thanks,
Lior