Seems like a bug.

On Tue, Apr 3, 2018 at 1:26 PM, Li Jin <> wrote:

> Hi Devs,
> I am seeing some behavior with window functions that is a bit unintuitive
> and would like to get some clarification.
> When using aggregation function with window, the frame boundary seems to
> change depending on the order of the window.
> Example:
> (1)
> df = spark.createDataFrame([[0, 1], [0, 2], [0, 3]]).toDF('id', 'v')
> w1 = Window.partitionBy('id')
> df.withColumn('v2', mean(df.v).over(w1)).show()
> +---+---+---+
> | id|  v| v2|
> +---+---+---+
> |  0|  1|2.0|
> |  0|  2|2.0|
> |  0|  3|2.0|
> +---+---+---+
> (2)
> df = spark.createDataFrame([[0, 1], [0, 2], [0, 3]]).toDF('id', 'v')
> w2 = Window.partitionBy('id').orderBy('v')
> df.withColumn('v2', mean(df.v).over(w2)).show()
> +---+---+---+
> | id|  v| v2|
> +---+---+---+
> |  0|  1|1.0|
> |  0|  2|1.5|
> |  0|  3|2.0|
> +---+---+---+
> Seems like orderBy('v') in the example (2) also changes the frame
> boundaries from (
> unboundedPreceding, unboundedFollowing) to (unboundedPreceding,
> currentRow).
> I found this behavior a bit unintuitive. I wonder if this behavior is by
> design and if so, what's the specific rule that orderBy() interacts with
> frame boundaries?
> Thanks,
> Li

Reply via email to