Hi all, Before I go the route of rolling my own UDAF: I'm doing a calculation of last 5 mean so I have the following window defined:
Window.partitionBy(person).orderBy(timestamp).rowsBetween(-4, Window.currentRow) Then I calculate the mean over that window. Within each partition, I'd like the first 4 elements to return null / NaN because there aren't enough rows to be a true "last 5." This is the behavior when I do this in pandas using rolling mean. Instead, it appears to calculate the mean of whatever rows happen to be in the partition, even if there is only 1 row. Is there a simple way already in Spark to do this? It seems like a normal thing so I wonder if I am missing something. Thanks! Sumona