Hi Arun, I want to select the entire row with the max timestamp for each group. I have modified my data set below to avoid any confusion.
*Input:* id | amount | my_timestamp ------------------------------------------- 1 | 5 | 2018-04-01T01:00:00.000Z 1 | 10 | 2018-04-01T01:10:00.000Z 1 | 6 | 2018-04-01T01:20:00.000Z 2 | 30 | 2018-04-01T01:25:00.000Z 2 | 40 | 2018-04-01T01:30:00.000Z *Expected Output:* id | amount | my_timestamp ------------------------------------------- 1 | 10 | 2018-04-01T01:10:00.000Z 2 | 40 | 2018-04-01T01:30:00.000Z Looking for a streaming solution using either raw sql like sparkSession.sql("sql query") or similar to raw sql but not something like mapGroupWithState On Wed, Apr 18, 2018 at 9:36 AM, Arun Mahadevan <ar...@apache.org> wrote: > Cant the “max” function used here ? Something like.. > > stream.groupBy($"id").max("amount").writeStream. > outputMode(“complete”/“update")…. > > Unless the “stream” is already a grouped stream, in which case the above > would not work since the support for multiple aggregate operations is not > there yet. > > Thanks, > Arun > > From: kant kodali <kanth...@gmail.com> > Date: Tuesday, April 17, 2018 at 11:41 AM > To: Tathagata Das <tathagata.das1...@gmail.com> > Cc: "user @spark" <user@spark.apache.org> > Subject: Re: can we use mapGroupsWithState in raw sql? > > Hi TD, > > Thanks for that. The only reason I ask is I don't see any alternative > solution to solve the problem below using raw sql. > > > How to select the max row for every group in spark structured streaming > 2.3.0 without using order by since it requires complete mode or > mapGroupWithState? > > *Input:* > > id | amount | my_timestamp > ------------------------------------------- > 1 | 5 | 2018-04-01T01:00:00.000Z > 1 | 10 | 2018-04-01T01:10:00.000Z > 2 | 20 | 2018-04-01T01:20:00.000Z > 2 | 30 | 2018-04-01T01:25:00.000Z > 2 | 40 | 2018-04-01T01:30:00.000Z > > *Expected Output:* > > id | amount | my_timestamp > ------------------------------------------- > 1 | 10 | 2018-04-01T01:10:00.000Z > 2 | 40 | 2018-04-01T01:30:00.000Z > > Looking for a streaming solution using either raw sql like > sparkSession.sql("sql > query") or similar to raw sql but not something like mapGroupWithState > > On Mon, Apr 16, 2018 at 8:32 PM, Tathagata Das < > tathagata.das1...@gmail.com> wrote: > >> Unfortunately no. Honestly it does not make sense as for type-aware >> operations like map, mapGroups, etc., you have to provide an actual JVM >> function. That does not fit in with the SQL language structure. >> >> On Mon, Apr 16, 2018 at 7:34 PM, kant kodali <kanth...@gmail.com> wrote: >> >>> Hi All, >>> >>> can we use mapGroupsWithState in raw SQL? or is it in the roadmap? >>> >>> Thanks! >>> >>> >>> >> >