Did you take a look at the excellent write up by Yin Huai and Michael Armbrust? It appears that rank is supported in the 1.4.x release.
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html Snippet from above article for your convenience: To answer the first question “*What are the best-selling and the second best-selling products in every category?*”, we need to rank products in a category based on their revenue, and to pick the best selling and the second best-selling products based the ranking. Below is the SQL query used to answer this question by using window function dense_rank (we will explain the syntax of using window functions in next section). SELECT product, category, revenueFROM ( SELECT product, category, revenue, dense_rank() OVER (PARTITION BY category ORDER BY revenue DESC) as rank FROM productRevenue) tmpWHERE rank <= 2 The result of this query is shown below. Without using window functions, it is very hard to express the query in SQL, and even if a SQL query can be expressed, it is hard for the underlying engine to efficiently evaluate the query. [image: 1-2] SQLDataFrame APIRanking functionsrankrankdense_rankdenseRankpercent_rank percentRankntilentilerow_numberrowNumber HTH. -Todd On Thu, Jul 16, 2015 at 8:10 AM, Lior Chaga <lio...@taboola.com> wrote: > Does spark HiveContext support the rank() ... distribute by syntax (as in > the following article- > http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/doing_rank_with_hive > )? > > If not, how can it be achieved? > > Thanks, > Lior >