Hi,

Can I ask for some clarifications regarding intended behavior of window
/ TimeWindow?

PySpark documentation states that "Windows in the order of months are
not supported". This is further confirmed by the checks in
TimeWindow.getIntervalInMicroseconds (https://git.io/vMP5l).

Surprisingly enough we can pass interval much larger than a month by
expressing interval in days or another unit of a higher precision. So
this fails:

Seq("2017-01-01").toDF("date").groupBy(window($"date", "1 month"))

while following is accepted:

Seq("2017-01-01").toDF("date").groupBy(window($"date", "999 days"))

with results which look sensible at first glance.

Is it a matter of a faulty validation logic (months will be assigned
only if there is a match against years or months https://git.io/vMPdi)
or expected behavior and I simply misunderstood the intentions?

-- 
Best,
Maciej

Reply via email to