Re: dataframes sql order by not total ordering

2015-07-21 Thread Carol McDonald
Thanks, that works a lot better ;) scala val results =sqlContext.sql(select movies.title, movierates.maxr, movierates.minr, movierates.cntu from(SELECT ratings.product, max(ratings.rating) as maxr, min(ratings.rating) as minr,count(distinct user) as cntu FROM ratings group by ratings.product )

dataframes sql order by not total ordering

2015-07-20 Thread Carol McDonald
the following query on the Movielens dataset , is sorting by the count of ratings for a movie. It looks like the results are ordered by partition ? scala val results =sqlContext.sql(select movies.title, movierates.maxr, movierates.minr, movierates.cntu from(SELECT ratings.product,

Re: dataframes sql order by not total ordering

2015-07-20 Thread Michael Armbrust
An ORDER BY needs to be on the outermost query otherwise subsequent operations (such as the join) could reorder the tuples. On Mon, Jul 20, 2015 at 9:25 AM, Carol McDonald cmcdon...@maprtech.com wrote: the following query on the Movielens dataset , is sorting by the count of ratings for a