Copilot commented on code in PR #18541:
URL: https://github.com/apache/druid/pull/18541#discussion_r2396359836
##########
processing/src/main/java/org/apache/druid/segment/nested/StructuredData.java:
##########
@@ -170,7 +184,12 @@ public int compareTo(StructuredData o)
// finally compare hashes. there is a small chance of collisions for
objects that are not equal but have the
// same hash, we could revisit this later if needed
- return Long.compare(hash.getAsLong(), o.hash.getAsLong());
+ int hashCompare = Long.compare(hash.getAsLong(), o.hash.getAsLong());
+ if (hashCompare != 0) {
+ return hashCompare;
+ }
+
+ return Integer.compare(getSizeEstimate(), o.getSizeEstimate());
Review Comment:
Using only (hash, sizeEstimate) as the final tie-breakers means two unequal
values that collide on 64-bit hash and share the same serialized length will
compare as 0 and thus be considered equal (see equals implementation),
violating the general contract of equals/compareTo. To make equality logically
correct, add a final fallback that compares the actual serialized byte arrays
when hash and size are identical before returning 0.
##########
processing/src/main/java/org/apache/druid/segment/nested/StructuredData.java:
##########
@@ -183,20 +202,12 @@ public boolean equals(Object o)
return false;
}
StructuredData that = (StructuredData) o;
- if (value instanceof Object[] && that.value instanceof Object[]) {
- return Arrays.deepEquals((Object[]) value, (Object[]) that.value);
- }
- return Objects.equals(value, that.value);
+ // guarantees that equals is consistent with compareTo
+ return compareTo(that) == 0;
}
@Override
public int hashCode()
- {
- return Objects.hash(value);
- }
-
- // hashCode that relies on the object equality. Translates the hashcode to
an integer as well
- public int equalityHash()
{
return Longs.hashCode(hash.getAsLong());
Review Comment:
[nitpick] hashCode truncates the 64-bit XXHash to 32 bits, increasing
collision probability in hash-based collections for large datasets; consider
mixing both high and low 32-bit parts (e.g., (int)(h ^ (h >>> 32))) to better
distribute entropy.
```suggestion
long h = hash.getAsLong();
return (int)(h ^ (h >>> 32));
```
##########
processing/src/main/java/org/apache/druid/segment/nested/StructuredData.java:
##########
@@ -183,20 +202,12 @@ public boolean equals(Object o)
return false;
}
StructuredData that = (StructuredData) o;
- if (value instanceof Object[] && that.value instanceof Object[]) {
- return Arrays.deepEquals((Object[]) value, (Object[]) that.value);
- }
- return Objects.equals(value, that.value);
+ // guarantees that equals is consistent with compareTo
Review Comment:
[nitpick] Routing equals through compareTo forces the full ordering path
(including hash/size computations and potential serialization) for simple
equality checks, adding overhead to hash-based collections; introduce a fast
path (e.g., identity, null, primitive/String direct comparison, then
serialized-bytes equality) before invoking ordering logic.
```suggestion
// Fast path for null
if (isNull() && that.isNull()) {
return true;
} else if (isNull() || that.isNull()) {
return false;
}
// Fast path for String
if (isString() && that.isString()) {
return asString().equals(that.asString());
}
// Fast path for Number
if (isNumber() && that.isNumber()) {
// Use Double.compare for equality of numbers
return Double.compare(asNumber().doubleValue(),
that.asNumber().doubleValue()) == 0;
}
// For complex objects, compare hash and size estimate first
if (hash.getAsLong() != that.hash.getAsLong()) {
return false;
}
if (getSizeEstimate() != that.getSizeEstimate()) {
return false;
}
// Fallback to compareTo for final check (should be rare)
```
##########
processing/src/main/java/org/apache/druid/segment/nested/StructuredData.java:
##########
@@ -183,20 +202,12 @@ public boolean equals(Object o)
return false;
}
StructuredData that = (StructuredData) o;
- if (value instanceof Object[] && that.value instanceof Object[]) {
- return Arrays.deepEquals((Object[]) value, (Object[]) that.value);
- }
- return Objects.equals(value, that.value);
+ // guarantees that equals is consistent with compareTo
+ return compareTo(that) == 0;
Review Comment:
equals delegates entirely to compareTo, which (due to the current hash/size
tie-breaker) can declare different logical values equal on hash collision plus
equal length, creating a possibility of incorrect equality semantics; equals
should instead directly compare underlying values (e.g., by serialized bytes or
structured value comparison) and only use compareTo for ordering. Recommend
implementing equals via structural/byte-array equality and letting compareTo
remain a total ordering.
```suggestion
return java.util.Objects.equals(this.value, that.value);
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]