Copilot commented on code in PR #88:
URL: https://github.com/apache/datasketches-rust/pull/88#discussion_r2787935938
##########
datasketches/src/tdigest/sketch.rs:
##########
@@ -1251,15 +1251,24 @@ impl Centroid {
fn add(&mut self, other: Centroid) {
let (self_weight, other_weight) = (self.weight(), other.weight());
let total_weight = self_weight + other_weight;
- self.weight = self.weight.saturating_add(other.weight.get());
+ self.weight = self
+ .weight
+ .checked_add(other.weight.get())
+ .expect("weight overflow");
let (self_mean, other_mean) = (self.mean, other.mean);
- let ratio_self = self_weight / total_weight;
let ratio_other = other_weight / total_weight;
- self.mean = self_mean.mul_add(ratio_self, other_mean * ratio_other);
+ let delta = other_mean - self_mean;
+ self.mean = if delta.is_finite() {
+ delta.mul_add(ratio_other, self_mean)
+ } else {
+ let ratio_self = self_weight / total_weight;
+ self_mean.mul_add(ratio_self, other_mean * ratio_other)
+ };
Review Comment:
The new `delta.is_finite()` fallback path is meant to handle subtraction
overflow when `other_mean - self_mean` becomes infinite. There isn't any
regression test exercising this scenario, so it's easy to accidentally break in
the future. Please add a unit test (likely in `sketch.rs` under `#[cfg(test)]`)
that merges two centroids with very large opposite-signed means (so `delta`
becomes infinite) and asserts the resulting centroid mean remains finite and
matches the expected weighted average.
##########
datasketches/src/tdigest/sketch.rs:
##########
@@ -1251,15 +1251,24 @@ impl Centroid {
fn add(&mut self, other: Centroid) {
let (self_weight, other_weight) = (self.weight(), other.weight());
let total_weight = self_weight + other_weight;
- self.weight = self.weight.saturating_add(other.weight.get());
+ self.weight = self
+ .weight
+ .checked_add(other.weight.get())
+ .expect("weight overflow");
Review Comment:
`Centroid::add` now panics on weight overflow via
`checked_add(...).expect(...)`. This is a behavior change from the previous
saturating add and can turn malformed/untrusted serialized data (or extremely
large merges) into a hard process crash (DoS) when `compress()`/`merge()`
triggers centroid merging. Consider restoring saturating behavior here, or
handling overflow in a non-panicking way (e.g., saturate to the max
representable weight and keep invariants), or validating/limiting weights
earlier so `add` cannot panic on user-controlled input.
```suggestion
let combined_weight =
self.weight.get().saturating_add(other.weight.get());
// `combined_weight` is guaranteed non-zero because both operands
are non-zero
// and `saturating_add` on positive integers cannot produce zero.
self.weight = NonZeroU64::new(combined_weight).unwrap();
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]