petern48 commented on code in PR #19278:
URL: https://github.com/apache/datafusion/pull/19278#discussion_r2617308599


##########
datafusion/core/tests/dataframe/mod.rs:
##########
@@ -1102,26 +1102,26 @@ async fn window_using_aggregates() -> Result<()> {
     | first_value | last_val | approx_distinct | approx_median | median | max 
| min  | c2 | c3   |
     
+-------------+----------+-----------------+---------------+--------+-----+------+----+------+
     |             |          |                 |               |        |     
|      | 1  | -85  |
-    | -85         | -101     | 14              | -12           | -101   | 83  
| -101 | 4  | -54  |
-    | -85         | -101     | 17              | -25           | -101   | 83  
| -101 | 5  | -31  |
-    | -85         | -12      | 10              | -32           | -12    | 83  
| -85  | 3  | 13   |
-    | -85         | -25      | 3               | -56           | -25    | -25 
| -85  | 1  | -5   |
-    | -85         | -31      | 18              | -29           | -31    | 83  
| -101 | 5  | 36   |
-    | -85         | -38      | 16              | -25           | -38    | 83  
| -101 | 4  | 65   |
+    | -85         | -101     | 14              | -12           | -12    | 83  
| -101 | 4  | -54  |
+    | -85         | -101     | 17              | -25           | -25    | 83  
| -101 | 5  | -31  |
+    | -85         | -12      | 10              | -32           | -34    | 83  
| -85  | 3  | 13   |
+    | -85         | -25      | 3               | -56           | -56    | -25 
| -85  | 1  | -5   |
+    | -85         | -31      | 18              | -29           | -28    | 83  
| -101 | 5  | 36   |
+    | -85         | -38      | 16              | -25           | -25    | 83  
| -101 | 4  | 65   |
     | -85         | -43      | 7               | -43           | -43    | 83  
| -85  | 2  | 45   |
-    | -85         | -48      | 6               | -35           | -48    | 83  
| -85  | 2  | -43  |
-    | -85         | -5       | 4               | -37           | -5     | -5  
| -85  | 1  | 83   |
-    | -85         | -54      | 15              | -17           | -54    | 83  
| -101 | 4  | -38  |
-    | -85         | -56      | 2               | -70           | -56    | -56 
| -85  | 1  | -25  |
-    | -85         | -72      | 9               | -43           | -72    | 83  
| -85  | 3  | -12  |
+    | -85         | -48      | 6               | -35           | -36    | 83  
| -85  | 2  | -43  |
+    | -85         | -5       | 4               | -37           | -40    | -5  
| -85  | 1  | 83   |
+    | -85         | -54      | 15              | -17           | -18    | 83  
| -101 | 4  | -38  |
+    | -85         | -56      | 2               | -70           | 57     | -56 
| -85  | 1  | -25  |

Review Comment:
   Great catch. I looked into it, and it seems like it's wrapping around due to 
integer overflow while taking the average of the middle two values (since the 
count is even).
   
   ```
   low: [-85], high: -56, median: 57 datatype: Int8
   ```
   
   -85 + -56 = -141 -> wraparound to 115
   Then 115 / 2 -> 57.5 -> 57 (truncated due to integer type)
   
   What's our desired behavior in this case? We could promote to a larger 
datatype to perform the calculation. Also is it intentional to return the value 
as a truncated integer instead of a float?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to