Hi, Most of join strategies do not preserve the orderings of input dfs (sort-merge joins only hold the ordering of a left input df). So, as said earlier, you need to explicitly sort them if you want ordered outputs.
// maropu On Wed, Jun 29, 2016 at 3:38 PM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hi, > > Well I would not assume anything myself. If you want to order it do it > explicitly. > > Let us take a simple case by creating three DFs based on existing tables > > val s = > HiveContext.table("sales").select("AMOUNT_SOLD","TIME_ID","CHANNEL_ID") > val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC") > val t = HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC") > > now let us join these tables > > val rs = > s.join(t,"time_id").join(c,"channel_id").groupBy("calendar_month_desc","channel_desc").agg(sum("amount_sold").as("TotalSales")) > > And do ab order explicitly > > val rs1 = rs.*orderBy* > ("calendar_month_desc","channel_desc").take(5).foreach(println) > > > HTH > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 29 June 2016 at 14:32, Jestin Ma <jestinwith.a...@gmail.com> wrote: > >> If it’s not too much trouble, could I get some pointers/help on this? >> (see link) >> >> http://stackoverflow.com/questions/38085801/can-dataframe-joins-in-spark-preserve-order >> >> -also, as a side question, do Dataframes support easy reordering of >> columns? >> >> Thank you! >> Jestin >> > > -- --- Takeshi Yamamuro