The distinct transformation does not preserve order, you need to distinct first, then orderby.

Enrico


Am 06.01.20 um 00:39 schrieb Mich Talebzadeh:
Hi,

I am working out monthly outgoing etc from an account and I am using the following code

import org.apache.spark.sql.expressions.Window
val wSpec = Window.partitionBy(year(col("transactiondate")),month(col("transactiondate")))
joint_accounts.
      select(year(col("transactiondate")).as("Year")
    , month(col("transactiondate")).as("Month")
    , sum("moneyin").over(wSpec).cast("DECIMAL(10,2)").as("incoming Per Month")     , sum("moneyout").over(wSpec).cast("DECIMAL(10,2)").as("outgoing Per Month")).
*orderBy(year(col("transactiondate")),month(col("transactiondate"))).*
    distinct.
    show(1000,false)

This shows as follows:


|Year|Month|incoming Per Month|outgoing Per Month|
+----+-----+------------------+------------------+
|2019|9    |13958.58          |17920.31          |
|2019|11   |4032.30           |4225.30           |
|2020|1    |1530.00           |1426.91           |
|2019|10   |10029.00          |10067.52          |
|2019|12   |742.00            |814.49            |
+----+-----+------------------+------------------+

 however the orderby is not correct as I expect to see 2010 record and 2019 records in the order of year and month.

Any suggestions?

Thanks,

Dr Mich Talebzadeh

LinkedIn /https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/

http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk.Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.


Reply via email to