sesteves commented on code in PR #19941:
URL: https://github.com/apache/datafusion/pull/19941#discussion_r2736087440
##########
datafusion/functions-aggregate-common/src/tdigest.rs:
##########
@@ -110,7 +99,7 @@ pub struct TDigest {
centroids: Vec<Centroid>,
max_size: usize,
sum: f64,
- count: u64,
+ count: f64,
Review Comment:
When using approx_percentile_cont_with_weight, weights can be:
1. Fractional values (e.g., 0.5, 1.5, 2.7)
2. Results of merging operations that produce non-integer weights
The TDigest algorithm computes percentiles using the total weight/count to
determine rank:
let rank = q * self.count;
If count were a u64, it couldn't accurately represent the sum of fractional
weights, leading to inaccurate percentile calculations.
This is also consistent with ClickHouse's TDigest implementation:
https://github.com/ClickHouse/ClickHouse/blob/927af1255adb37ace1b95cc3ec4316553b4cb4b4/src/AggregateFunctions/QuantileTDigest.h#L47
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]