petern48 commented on code in PR #19278:
URL: https://github.com/apache/datafusion/pull/19278#discussion_r2616516125


##########
datafusion/core/tests/dataframe/mod.rs:
##########
@@ -1102,26 +1102,26 @@ async fn window_using_aggregates() -> Result<()> {
     | first_value | last_val | approx_distinct | approx_median | median | max 
| min  | c2 | c3   |
     
+-------------+----------+-----------------+---------------+--------+-----+------+----+------+
     |             |          |                 |               |        |     
|      | 1  | -85  |
-    | -85         | -101     | 14              | -12           | -101   | 83  
| -101 | 4  | -54  |
-    | -85         | -101     | 17              | -25           | -101   | 83  
| -101 | 5  | -31  |
-    | -85         | -12      | 10              | -32           | -12    | 83  
| -85  | 3  | 13   |
-    | -85         | -25      | 3               | -56           | -25    | -25 
| -85  | 1  | -5   |
-    | -85         | -31      | 18              | -29           | -31    | 83  
| -101 | 5  | 36   |
-    | -85         | -38      | 16              | -25           | -38    | 83  
| -101 | 4  | 65   |
+    | -85         | -101     | 14              | -12           | -12    | 83  
| -101 | 4  | -54  |
+    | -85         | -101     | 17              | -25           | -25    | 83  
| -101 | 5  | -31  |

Review Comment:
   I found that this test was returning incorrect results due to the bug I 
explained in another comment, instead of raising an error. The results here 
were fixed by updating `evaluate()` to pass a `&mut` instead of consuming the 
state with `std::mem::take()`.



##########
datafusion/sqllogictest/test_files/aggregate.slt:
##########
@@ -991,6 +991,54 @@ SELECT approx_median(col_f64_nan) FROM median_table
 ----
 NaN
 
+# median_sliding_window
+statement ok
+CREATE TABLE median_window_test (
+    timestamp INT,
+    tags VARCHAR,
+    value DOUBLE
+);
+
+statement ok
+INSERT INTO median_window_test (timestamp, tags, value) VALUES
+(1, 'tag1', 10.0),
+(2, 'tag1', 20.0),
+(3, 'tag1', 30.0),
+(4, 'tag1', 40.0),
+(5, 'tag1', 50.0),
+(1, 'tag2', 60.0),
+(2, 'tag2', 70.0),
+(3, 'tag2', 80.0),
+(4, 'tag2', 90.0),
+(5, 'tag2', 100.0);
+
+query ITRR
+SELECT
+    timestamp,
+    tags,
+    value,
+    median(value) OVER (
+        PARTITION BY tags
+        ORDER BY timestamp
+        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING

Review Comment:
   I wasn't familiar with these before, but this was a great idea! It helped me 
find and understand a bug.



##########
datafusion/sqllogictest/test_files/aggregate.slt:
##########
@@ -991,6 +991,89 @@ SELECT approx_median(col_f64_nan) FROM median_table
 ----
 NaN
 
+# median_sliding_window
+statement ok
+CREATE TABLE median_window_test (
+    timestamp INT,
+    tags VARCHAR,
+    value DOUBLE
+);
+
+statement ok
+INSERT INTO median_window_test (timestamp, tags, value) VALUES
+(1, 'tag1', 10.0),
+(2, 'tag1', 20.0),
+(3, 'tag1', 30.0),
+(4, 'tag1', 40.0),
+(5, 'tag1', 50.0),
+(1, 'tag2', 60.0),
+(2, 'tag2', 70.0),
+(3, 'tag2', 80.0),
+(4, 'tag2', 90.0),
+(5, 'tag2', 100.0);
+
+query ITRR
+SELECT
+    timestamp,
+    tags,
+    value,
+    median(value) OVER (
+        PARTITION BY tags
+        ORDER BY timestamp
+        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
+    ) AS value_median_3
+FROM median_window_test
+ORDER BY tags, timestamp;
+----
+1 tag1 10 15
+2 tag1 20 20
+3 tag1 30 30
+4 tag1 40 40
+5 tag1 50 45
+1 tag2 60 65
+2 tag2 70 70
+3 tag2 80 80
+4 tag2 90 90
+5 tag2 100 95
+
+# median_non_sliding_window
+query ITRRRR
+SELECT
+    timestamp,
+    tags,
+    value,
+    median(value) OVER (
+        PARTITION BY tags
+        ORDER BY timestamp
+        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
+    ) AS value_median_unbounded_preceding,
+    median(value) OVER (
+        PARTITION BY tags
+        ORDER BY timestamp
+        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
+    ) AS value_median_unbounded_both,

Review Comment:
   For `UNBOUNDED FOLLOWING`, an error is raised when `retract_batch()` isn't 
implemented. I found that queries with `UNBOUNDED PRECEDING` do not trigger 
this and instead return incorrect results. I assume this is a bug, right? If 
so, I can file a ticket.
   
   For example, if you remove the `UNBOUNDED FOLLOWING` case right below my 
comment here, and try the query on main, I get this diff instead of an error.
   
   <details>
   <summary>Results Diff</summary>
   ```
   [Diff] (-expected|+actual)
       1 tag1 10 10 30
   -   2 tag1 20 15 30
   -   3 tag1 30 20 30
   -   4 tag1 40 25 30
   -   5 tag1 50 30 30
   +   2 tag1 20 20 30
   +   3 tag1 30 30 30
   +   4 tag1 40 40 30
   +   5 tag1 50 50 30
       1 tag2 60 60 80
   -   2 tag2 70 65 80
   -   3 tag2 80 70 80
   -   4 tag2 90 75 80
   -   5 tag2 100 80 80
   +   2 tag2 70 70 80
   +   3 tag2 80 80 80
   +   4 tag2 90 90 80
   +   5 tag2 100 100 80
   ```
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to