[ https://issues.apache.org/jira/browse/SPARK-41793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685384#comment-17685384 ]
Gera Shegalov edited comment on SPARK-41793 at 2/9/23 6:26 PM: --------------------------------------------------------------- Another interpretation of why the pre-3.4 count of 1 may be actually correct could be that regardless of whether the window frame bound values overflow or not the current row is always part of the window it defines. Whether or not it should be the case can be clarified in the doc. UPDATE: if we consider the range of this query plan above {{RangeFrame, -10.23, 6.79)}}. {code} spark-sql> select b - 10.23 as lower, b, b + 6.79 as upper from test_table; 11342371013783243717493546650944533.24 11342371013783243717493546650944543.47 11342371013783243717493546650944550.26 999999999999999999999999999999999989.76 999999999999999999999999999999999999.99 NULL {code} as consisting of the union of [lower; b) b (b; upper] only the (b; upper] is undefined the first [lower; b) and b are defined and have at least one row. either way not counting the current row is a counterintuitive result was (Author: jira.shegalov): Another interpretation of why the pre-3.4 count of 1 may be actually correct could be that regardless of whether the window frame bound values overflow or not the current row is always part of the window it defines. Whether or not it should be the case can be clarified in the doc. > Incorrect result for window frames defined by a range clause on large > decimals > ------------------------------------------------------------------------------- > > Key: SPARK-41793 > URL: https://issues.apache.org/jira/browse/SPARK-41793 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.4.0 > Reporter: Gera Shegalov > Priority: Blocker > Labels: correctness > > Context > https://github.com/NVIDIA/spark-rapids/issues/7429#issuecomment-1368040686 > The following windowing query on a simple two-row input should produce two > non-empty windows as a result > {code} > from pprint import pprint > data = [ > ('9223372036854775807', '11342371013783243717493546650944543.47'), > ('9223372036854775807', '999999999999999999999999999999999999.99') > ] > df1 = spark.createDataFrame(data, 'a STRING, b STRING') > df2 = df1.select(df1.a.cast('LONG'), df1.b.cast('DECIMAL(38,2)')) > df2.createOrReplaceTempView('test_table') > df = sql(''' > SELECT > COUNT(1) OVER ( > PARTITION BY a > ORDER BY b ASC > RANGE BETWEEN 10.2345 PRECEDING AND 6.7890 FOLLOWING > ) AS CNT_1 > FROM > test_table > ''') > res = df.collect() > df.explain(True) > pprint(res) > {code} > Spark 3.4.0-SNAPSHOT output: > {code} > [Row(CNT_1=1), Row(CNT_1=0)] > {code} > Spark 3.3.1 output as expected: > {code} > Row(CNT_1=1), Row(CNT_1=1)] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org