[ 
https://issues.apache.org/jira/browse/SPARK-41793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685384#comment-17685384
 ] 

Gera Shegalov commented on SPARK-41793:
---------------------------------------

Another interpretation of why the pre-3.4 count of 1 may be actually correct 
could be that regardless of whether the window frame bound values overflow or 
notĀ  the current row is always part of the window it defines. Whether or not it 
should be the case can be clarified in the doc.

> Incorrect result for window frames defined by a range clause on large 
> decimals 
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-41793
>                 URL: https://issues.apache.org/jira/browse/SPARK-41793
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Gera Shegalov
>            Priority: Blocker
>              Labels: correctness
>
> Context 
> https://github.com/NVIDIA/spark-rapids/issues/7429#issuecomment-1368040686
> The following windowing query on a simple two-row input should produce two 
> non-empty windows as a result
> {code}
> from pprint import pprint
> data = [
>   ('9223372036854775807', '11342371013783243717493546650944543.47'),
>   ('9223372036854775807', '999999999999999999999999999999999999.99')
> ]
> df1 = spark.createDataFrame(data, 'a STRING, b STRING')
> df2 = df1.select(df1.a.cast('LONG'), df1.b.cast('DECIMAL(38,2)'))
> df2.createOrReplaceTempView('test_table')
> df = sql('''
>   SELECT 
>     COUNT(1) OVER (
>       PARTITION BY a 
>       ORDER BY b ASC 
>       RANGE BETWEEN 10.2345 PRECEDING AND 6.7890 FOLLOWING
>     ) AS CNT_1 
>   FROM 
>     test_table
>   ''')
> res = df.collect()
> df.explain(True)
> pprint(res)
> {code}
> SparkĀ 3.4.0-SNAPSHOT output:
> {code}
> [Row(CNT_1=1), Row(CNT_1=0)]
> {code}
> Spark 3.3.1 output as expected:
> {code}
> Row(CNT_1=1), Row(CNT_1=1)]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to