petern48 commented on code in PR #19278:
URL: https://github.com/apache/datafusion/pull/19278#discussion_r2617308599
##########
datafusion/core/tests/dataframe/mod.rs:
##########
@@ -1102,26 +1102,26 @@ async fn window_using_aggregates() -> Result<()> {
| first_value | last_val | approx_distinct | approx_median | median | max
| min | c2 | c3 |
+-------------+----------+-----------------+---------------+--------+-----+------+----+------+
| | | | | |
| | 1 | -85 |
- | -85 | -101 | 14 | -12 | -101 | 83
| -101 | 4 | -54 |
- | -85 | -101 | 17 | -25 | -101 | 83
| -101 | 5 | -31 |
- | -85 | -12 | 10 | -32 | -12 | 83
| -85 | 3 | 13 |
- | -85 | -25 | 3 | -56 | -25 | -25
| -85 | 1 | -5 |
- | -85 | -31 | 18 | -29 | -31 | 83
| -101 | 5 | 36 |
- | -85 | -38 | 16 | -25 | -38 | 83
| -101 | 4 | 65 |
+ | -85 | -101 | 14 | -12 | -12 | 83
| -101 | 4 | -54 |
+ | -85 | -101 | 17 | -25 | -25 | 83
| -101 | 5 | -31 |
+ | -85 | -12 | 10 | -32 | -34 | 83
| -85 | 3 | 13 |
+ | -85 | -25 | 3 | -56 | -56 | -25
| -85 | 1 | -5 |
+ | -85 | -31 | 18 | -29 | -28 | 83
| -101 | 5 | 36 |
+ | -85 | -38 | 16 | -25 | -25 | 83
| -101 | 4 | 65 |
| -85 | -43 | 7 | -43 | -43 | 83
| -85 | 2 | 45 |
- | -85 | -48 | 6 | -35 | -48 | 83
| -85 | 2 | -43 |
- | -85 | -5 | 4 | -37 | -5 | -5
| -85 | 1 | 83 |
- | -85 | -54 | 15 | -17 | -54 | 83
| -101 | 4 | -38 |
- | -85 | -56 | 2 | -70 | -56 | -56
| -85 | 1 | -25 |
- | -85 | -72 | 9 | -43 | -72 | 83
| -85 | 3 | -12 |
+ | -85 | -48 | 6 | -35 | -36 | 83
| -85 | 2 | -43 |
+ | -85 | -5 | 4 | -37 | -40 | -5
| -85 | 1 | 83 |
+ | -85 | -54 | 15 | -17 | -18 | 83
| -101 | 4 | -38 |
+ | -85 | -56 | 2 | -70 | 57 | -56
| -85 | 1 | -25 |
Review Comment:
Great catch. I looked into it, and it seems like it's wrapping around due to
integer overflow while taking the average of the middle two values (since the
count is even).
```
low: [-85], high: -56, median: 57 datatype: Int8
```
-85 + -56 = -141 -> wraparound to 115
Then 115 / 2 -> 57.5 -> 57 (truncated due to integer type)
What's our desired behavior in this case? We could promote to a larger
datatype to perform the calculation. Also is it intentional to return the value
as a truncated integer instead of a float?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]