Hi, This is more a question for the User list.
Lead and Lag imply ordering of the whole dataset, and this is not supported. You can use Lead/Lag in an ordered window function and you'll be fine: *select lead(max(expenses)) over (order by customerId) from tbl group by customerId* HTH Met vriendelijke groet/Kind regards, Herman van Hövell tot Westerflier QuestTec B.V. Torenwacht 98 2353 DC Leiderdorp hvanhov...@questtec.nl +31 6 420 590 27 2015-11-02 11:33 GMT+01:00 Shagun Sodhani <sshagunsodh...@gmail.com>: > Hi! I was trying out window functions in SparkSql (using hive context) > and I noticed that while this > <https://issues.apache.org/jira/browse/TAJO-919?jql=text%20~%20%22lag%20window%22> > mentions that *lead* is implemented as an aggregate operator, it seems > not to be the case. > > I am using the following configuration: > > Query : SELECT lead(max(`expenses`)) FROM `table` GROUP BY `customerId` > Spark Version: 10.4 > SparkSql Version: 1.5.1 > > I am using the standard example of (`customerId`, `expenses`) scheme where > each customer has multiple values for expenses (though I am setting age as > Double and not Int as I am trying out maths functions). > > > *java.lang.NullPointerException at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFLeadLag.evaluate(GenericUDFLeadLag.java:57)* > > The entire error stack can be found here <http://pastebin.com/jTRR4Ubx>. > > Can someone confirm if this is an actual issue or some oversight on my > part? > > Thanks! >