tcondie edited a comment on issue #25491: [SPARK-28699][SQL] Disable using 
radix sort for ShuffleExchangeExec in repartition case
URL: https://github.com/apache/spark/pull/25491#issuecomment-551957225
 
 
   I'm looking at the RecordBinaryComparator class and have some questions 
regarding its compare method. The following cases raised a possible concern:
   ```
       // check if stars align and we can get both offsets to be aligned
       if ((leftOff % 8) == (rightOff % 8)) {
         while ((leftOff + i) % 8 != 0 && i < leftLen) {
           final int v1 = Platform.getByte(leftObj, leftOff + i) & 0xff;
           final int v2 = Platform.getByte(rightObj, rightOff + i) & 0xff;
           if (v1 != v2) {
             return v1 > v2 ? 1 : -1;
           }
           i += 1;
         }
       }
       // for architectures that support unaligned accesses, chew it up 8 bytes 
at a time
       if (Platform.unaligned() || (((leftOff + i) % 8 == 0) && ((rightOff + i) 
% 8 == 0))) {
         while (i <= leftLen - 8) {
           final long v1 = Platform.getLong(leftObj, leftOff + i);
           final long v2 = Platform.getLong(rightObj, rightOff + i);
           if (v1 != v2) {
             return v1 > v2 ? 1 : -1;
           }
           i += 8;
         }
       }
   ```
   Let's say during the first shuffle instance, the offsets happen to be 
aligned, and the byte comparison occurs. Further assume that v1 byte is all 
ones (11111111) and v2 is (01111111), then the integer comparison v1 > v2 is 
true. In the second instance of the shuffle (i.e., after failure / stage rerun) 
the offsets are not aligned (because the records entered the system in a 
different order) and we instead drop down to the Long comparison, which will 
compare signed longs making v1 negative (sign bit set) making v2 > v1 true. 
   
   This is just a theory but I thought I'd share for feedback; is this a 
possible code path?; should we be using Long.compareUnsigned(v1, v2) instead of 
v1 > v2?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to