Hi Ross,

What version of spark are you using? There were two issues that affected
the results of window function in Spark 1.5 branch. Both of issues have
been fixed and will be released with Spark 1.5.2 (this release will happen
soon). For more details of these two issues, you can take a look at
https://issues.apache.org/jira/browse/SPARK-11135 and
https://issues.apache.org/jira/browse/SPARK-11009.

Thanks,

Yin

On Mon, Nov 2, 2015 at 12:07 PM, <ross.cramb...@thomsonreuters.com> wrote:

> Hello Spark community -
> I am running a Spark SQL query to calculate the difference in time between
> consecutive events, using lag(event_time) over window -
>
> SELECT device_id,
>        unix_time,
>        event_id,
>        unix_time - lag(unix_time)
>           OVER
>         (PARTITION BY device_id ORDER BY unix_time,event_id)
>          AS seconds_since_last_event
> FROM ios_d_events;
>
> This is giving me some strange results in the case where the first two
> events for a particular device_id have the same timestamp.
> I used to following query to take a look at what value was being returned
> by lag():
>
> SELECT device_id,
>        event_time,
>        unix_time,
>        event_id,
>        lag(event_time) OVER (PARTITION BY device_id ORDER BY 
> unix_time,event_id) AS lag_time
> FROM ios_d_events;
>
> I’m seeing that in these cases, I am getting something like 1970-01-03 …
> instead of a null value, and the following lag times are all following the
> same format.
>
> I posted a section of this output in this SO question:
> http://stackoverflow.com/questions/33482167/spark-sql-window-function-lag-giving-unexpected-resutls
>
> The errant results are labeled with device_id 99999999999999999999999.
>
> Any idea why this is occurring?
>
> - Ross
>

Reply via email to