The distinct transformation does not preserve order, you need to
distinct first, then orderby.
Enrico
Am 06.01.20 um 00:39 schrieb Mich Talebzadeh:
Hi,
I am working out monthly outgoing etc from an account and I am using
the following code
import org.apache.spark.sql.expressions.Window
val wSpec =
Window.partitionBy(year(col("transactiondate")),month(col("transactiondate")))
joint_accounts.
select(year(col("transactiondate")).as("Year")
, month(col("transactiondate")).as("Month")
, sum("moneyin").over(wSpec).cast("DECIMAL(10,2)").as("incoming
Per Month")
, sum("moneyout").over(wSpec).cast("DECIMAL(10,2)").as("outgoing
Per Month")).
*orderBy(year(col("transactiondate")),month(col("transactiondate"))).*
distinct.
show(1000,false)
This shows as follows:
|Year|Month|incoming Per Month|outgoing Per Month|
+----+-----+------------------+------------------+
|2019|9 |13958.58 |17920.31 |
|2019|11 |4032.30 |4225.30 |
|2020|1 |1530.00 |1426.91 |
|2019|10 |10029.00 |10067.52 |
|2019|12 |742.00 |814.49 |
+----+-----+------------------+------------------+
however the orderby is not correct as I expect to see 2010 record and
2019 records in the order of year and month.
Any suggestions?
Thanks,
Dr Mich Talebzadeh
LinkedIn
/https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own risk.Any and all responsibility for
any loss, damage or destruction of data or any other property which
may arise from relying on this email's technical content is explicitly
disclaimed. The author will in no case be liable for any monetary
damages arising from such loss, damage or destruction.