tcondie edited a comment on issue #25491: [SPARK-28699][SQL] Disable using radix sort for ShuffleExchangeExec in repartition case URL: https://github.com/apache/spark/pull/25491#issuecomment-551957225 I'm looking at the RecordBinaryComparator class and have some questions regarding its compare method. The following cases raised a possible concern: ``` // check if stars align and we can get both offsets to be aligned if ((leftOff % 8) == (rightOff % 8)) { while ((leftOff + i) % 8 != 0 && i < leftLen) { final int v1 = Platform.getByte(leftObj, leftOff + i) & 0xff; final int v2 = Platform.getByte(rightObj, rightOff + i) & 0xff; if (v1 != v2) { return v1 > v2 ? 1 : -1; } i += 1; } } // for architectures that support unaligned accesses, chew it up 8 bytes at a time if (Platform.unaligned() || (((leftOff + i) % 8 == 0) && ((rightOff + i) % 8 == 0))) { while (i <= leftLen - 8) { final long v1 = Platform.getLong(leftObj, leftOff + i); final long v2 = Platform.getLong(rightObj, rightOff + i); if (v1 != v2) { return v1 > v2 ? 1 : -1; } i += 8; } } ``` Let's say during the first shuffle instance, the offsets happen to be aligned, and the byte comparison occurs. Further assume that v1 byte is all ones (11111111) and v2 is (01111111), then the integer comparison v1 > v2 is true. In the second instance of the shuffle (i.e., after failure / stage rerun) the offsets are not aligned (because the records entered the system in a different order) and we instead drop down to the Long comparison, which will compare signed longs making v1 negative (sign bit set) making v2 > v1 true. This is just a theory but I thought I'd share for feedback; is this a possible code path?; should we be using Long.compareUnsigned(v1, v2) instead of v1 > v2?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org