How to select the entire row that has max timestamp for every key in Spark Structured Streaming 2.1.1?

kant kodali Tue, 29 Aug 2017 13:00:46 -0700

Hi All,

I am wondering what is the easiest and concise way to express the
computation below in Spark Structured streaming given that it supports both
imperative and declarative styles?
I am just trying to select rows that has max timestamp for each train?
Instead of doing some sort of nested queries like we normally do in any
relational database I am trying to see if I can leverage both imperative
and declarative at the same time. If nested queries or join are not
required then I would like to see how this can be possible? I am using
spark 2.1.1.


Dataset

Train    Dest      Time1        HK        10:001        SH
12:001        SZ        14:002        HK        13:002        SH
 09:002        SZ        07:00

The desired result should be:

Train    Dest      Time1        SZ        14:002        HK        13:00

How to select the entire row that has max timestamp for every key in Spark Structured Streaming 2.1.1?

Reply via email to