Do other (non-Hive) SQL systems do the same thing? On Tue, Apr 3, 2018 at 3:16 PM, Herman van Hövell tot Westerflier < her...@databricks.com> wrote:
> This is something we inherited from Hive: https://cwiki.apache. > org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics > > When ORDER BY is specified with missing WINDOW clause, the WINDOW >> specification defaults to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT >> ROW. > > When both ORDER BY and WINDOW clauses are missing, the WINDOW >> specification defaults to ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED >> FOLLOWING. > > > It sort of makes sense if you think about it. If there is no ordering > there is no way to have a bound frame. If there is ordering we default to > the most commonly used deterministic frame. > > > On Tue, Apr 3, 2018 at 11:09 PM, Reynold Xin <r...@databricks.com> wrote: > >> Seems like a bug. >> >> >> >> On Tue, Apr 3, 2018 at 1:26 PM, Li Jin <ice.xell...@gmail.com> wrote: >> >>> Hi Devs, >>> >>> I am seeing some behavior with window functions that is a bit >>> unintuitive and would like to get some clarification. >>> >>> When using aggregation function with window, the frame boundary seems to >>> change depending on the order of the window. >>> >>> Example: >>> (1) >>> >>> df = spark.createDataFrame([[0, 1], [0, 2], [0, 3]]).toDF('id', 'v') >>> >>> w1 = Window.partitionBy('id') >>> >>> df.withColumn('v2', mean(df.v).over(w1)).show() >>> >>> +---+---+---+ >>> >>> | id| v| v2| >>> >>> +---+---+---+ >>> >>> | 0| 1|2.0| >>> >>> | 0| 2|2.0| >>> >>> | 0| 3|2.0| >>> >>> +---+---+---+ >>> >>> (2) >>> df = spark.createDataFrame([[0, 1], [0, 2], [0, 3]]).toDF('id', 'v') >>> >>> w2 = Window.partitionBy('id').orderBy('v') >>> >>> df.withColumn('v2', mean(df.v).over(w2)).show() >>> >>> +---+---+---+ >>> >>> | id| v| v2| >>> >>> +---+---+---+ >>> >>> | 0| 1|1.0| >>> >>> | 0| 2|1.5| >>> >>> | 0| 3|2.0| >>> >>> +---+---+---+ >>> >>> Seems like orderBy('v') in the example (2) also changes the frame >>> boundaries from ( >>> >>> unboundedPreceding, unboundedFollowing) to (unboundedPreceding, >>> currentRow). >>> >>> >>> I found this behavior a bit unintuitive. I wonder if this behavior is by >>> design and if so, what's the specific rule that orderBy() interacts with >>> frame boundaries? >>> >>> >>> Thanks, >>> >>> Li >>> >>> >> >