Mean over window with minimum number of rows

Sumona Routh Thu, 18 Oct 2018 11:00:11 -0700

Hi all,
Before I go the route of rolling my own UDAF:

I'm doing a calculation of last 5 mean so I have the following window
defined:


Window.partitionBy(person).orderBy(timestamp).rowsBetween(-4, Window.currentRow)

Then I calculate the mean over that window.

Within each partition, I'd like the first 4 elements to return null / NaN
because there aren't enough rows to be a true "last 5." This is the
behavior when I do this in pandas using rolling mean. Instead, it appears
to calculate the mean of whatever rows happen to be in the partition, even
if there is only 1 row.

Is there a simple way already in Spark to do this? It seems like a normal
thing so I wonder if I am missing something.

Thanks!
Sumona

Mean over window with minimum number of rows

Reply via email to