Copilot commented on code in PR #18541:
URL: https://github.com/apache/druid/pull/18541#discussion_r2396359836


##########
processing/src/main/java/org/apache/druid/segment/nested/StructuredData.java:
##########
@@ -170,7 +184,12 @@ public int compareTo(StructuredData o)
 
     // finally compare hashes. there is a small chance of collisions for 
objects that are not equal but have the
     // same hash, we could revisit this later if needed
-    return Long.compare(hash.getAsLong(), o.hash.getAsLong());
+    int hashCompare = Long.compare(hash.getAsLong(), o.hash.getAsLong());
+    if (hashCompare != 0) {
+      return hashCompare;
+    }
+
+    return Integer.compare(getSizeEstimate(), o.getSizeEstimate());

Review Comment:
   Using only (hash, sizeEstimate) as the final tie-breakers means two unequal 
values that collide on 64-bit hash and share the same serialized length will 
compare as 0 and thus be considered equal (see equals implementation), 
violating the general contract of equals/compareTo. To make equality logically 
correct, add a final fallback that compares the actual serialized byte arrays 
when hash and size are identical before returning 0.



##########
processing/src/main/java/org/apache/druid/segment/nested/StructuredData.java:
##########
@@ -183,20 +202,12 @@ public boolean equals(Object o)
       return false;
     }
     StructuredData that = (StructuredData) o;
-    if (value instanceof Object[] && that.value instanceof Object[]) {
-      return Arrays.deepEquals((Object[]) value, (Object[]) that.value);
-    }
-    return Objects.equals(value, that.value);
+    // guarantees that equals is consistent with compareTo
+    return compareTo(that) == 0;
   }
 
   @Override
   public int hashCode()
-  {
-    return Objects.hash(value);
-  }
-
-  // hashCode that relies on the object equality. Translates the hashcode to 
an integer as well
-  public int equalityHash()
   {
     return Longs.hashCode(hash.getAsLong());

Review Comment:
   [nitpick] hashCode truncates the 64-bit XXHash to 32 bits, increasing 
collision probability in hash-based collections for large datasets; consider 
mixing both high and low 32-bit parts (e.g., (int)(h ^ (h >>> 32))) to better 
distribute entropy.
   ```suggestion
       long h = hash.getAsLong();
       return (int)(h ^ (h >>> 32));
   ```



##########
processing/src/main/java/org/apache/druid/segment/nested/StructuredData.java:
##########
@@ -183,20 +202,12 @@ public boolean equals(Object o)
       return false;
     }
     StructuredData that = (StructuredData) o;
-    if (value instanceof Object[] && that.value instanceof Object[]) {
-      return Arrays.deepEquals((Object[]) value, (Object[]) that.value);
-    }
-    return Objects.equals(value, that.value);
+    // guarantees that equals is consistent with compareTo

Review Comment:
   [nitpick] Routing equals through compareTo forces the full ordering path 
(including hash/size computations and potential serialization) for simple 
equality checks, adding overhead to hash-based collections; introduce a fast 
path (e.g., identity, null, primitive/String direct comparison, then 
serialized-bytes equality) before invoking ordering logic.
   ```suggestion
   
       // Fast path for null
       if (isNull() && that.isNull()) {
         return true;
       } else if (isNull() || that.isNull()) {
         return false;
       }
   
       // Fast path for String
       if (isString() && that.isString()) {
         return asString().equals(that.asString());
       }
   
       // Fast path for Number
       if (isNumber() && that.isNumber()) {
         // Use Double.compare for equality of numbers
         return Double.compare(asNumber().doubleValue(), 
that.asNumber().doubleValue()) == 0;
       }
   
       // For complex objects, compare hash and size estimate first
       if (hash.getAsLong() != that.hash.getAsLong()) {
         return false;
       }
       if (getSizeEstimate() != that.getSizeEstimate()) {
         return false;
       }
   
       // Fallback to compareTo for final check (should be rare)
   ```



##########
processing/src/main/java/org/apache/druid/segment/nested/StructuredData.java:
##########
@@ -183,20 +202,12 @@ public boolean equals(Object o)
       return false;
     }
     StructuredData that = (StructuredData) o;
-    if (value instanceof Object[] && that.value instanceof Object[]) {
-      return Arrays.deepEquals((Object[]) value, (Object[]) that.value);
-    }
-    return Objects.equals(value, that.value);
+    // guarantees that equals is consistent with compareTo
+    return compareTo(that) == 0;

Review Comment:
   equals delegates entirely to compareTo, which (due to the current hash/size 
tie-breaker) can declare different logical values equal on hash collision plus 
equal length, creating a possibility of incorrect equality semantics; equals 
should instead directly compare underlying values (e.g., by serialized bytes or 
structured value comparison) and only use compareTo for ordering. Recommend 
implementing equals via structural/byte-array equality and letting compareTo 
remain a total ordering.
   ```suggestion
       return java.util.Objects.equals(this.value, that.value);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to