dweiss commented on code in PR #875:
URL: https://github.com/apache/lucene/pull/875#discussion_r869363854


##########
lucene/core/src/java/org/apache/lucene/index/OrdinalMap.java:
##########
@@ -48,10 +49,69 @@ public class OrdinalMap implements Accountable {
   // need it
   // TODO: use more efficient packed ints structures?
 
+  /**
+   * Copy the first 8 bytes of the given term as a comparable unsigned long. 
In case the term has
+   * less than 8 bytes, missing bytes will be replaced with zeroes. Note that 
two terms that produce
+   * the same long could still be different due to the fact that missing bytes 
are replaced with
+   * zeroes, e.g. {@code [1, 0]} and {@code [1]} get mapped to the same long.
+   */
+  static long prefix8ToComparableUnsignedLong(BytesRef term) {
+    // Use Big Endian so that longs are comparable
+    if (term.length >= Long.BYTES) {
+      return (long) BitUtil.VH_BE_LONG.get(term.bytes, term.offset);
+    } else {
+      long l;
+      int offset;
+      if (term.length >= Integer.BYTES) {
+        l = (int) BitUtil.VH_BE_INT.get(term.bytes, term.offset);
+        offset = Integer.BYTES;
+      } else {
+        l = 0;
+        offset = 0;
+      }
+      while (offset < term.length) {
+        l = (l << 8) | Byte.toUnsignedLong(term.bytes[term.offset + offset]);
+        offset++;
+      }
+      l <<= (Long.BYTES - term.length) << 3;
+      return l;
+    }
+  }
+
+  private static int compare(BytesRef termA, long prefix8A, BytesRef termB, 
long prefix8B) {
+    assert prefix8A == prefix8ToComparableUnsignedLong(termA);

Review Comment:
   I think part of the gain is that Adrien's patch pads these byte slices so it 
can then compare them efficiently without further checks later on. I guess 
these small things will matter in super hot loops. I haven't dug deep enough to 
authoritatively state what the jdk is doing slower (an asm dump would help 
here). 
   
   This said, I do intuitively understand the code because I've done such 
tricks in the past (for sorting)... I'm not sure if it's worth the gain 
considering other factors.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to