[ https://issues.apache.org/jira/browse/HADOOP-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442756#comment-13442756 ]
Andy Isaacson commented on HADOOP-8648: --------------------------------------- I didn't understand why this code was wrong before, so I looked into it in more depth and I agree with Colin's analysis and patch. In the interest of making this easier for others to understand, here are a few references. http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html explains the GCC inline assembly syntax, and in particular how the {{asm("some assembly" : inputconstraints : outputconstraints : clobbers)}} syntax is parsed, and how the constraints map to the {{%n}} in the assembly string. http://asm.sourceforge.net/articles/rmiyagi-inline-asm.txt describes the x86 indexed addressing modes, in particular explaining how {{(%5,%4,1)}} is interpreted as "the word of memory at {{%5 + 1 * %4}}". http://softwarecommunity.intel.com/userfiles/en-us/d9156103.pdf describes the details of the SSE4 CRC32 instruction in mind-numbing detail, but that's not especially relevant to this bug. All we need to know is that {{crc32}}_size_ operates on 8, 32, or 64 bits depending on _size_, and its first argument is read-only while its second argument is used as an accumulator (read, modify, write). Finally, the comments in bulk_crc32.c are very helpful. Critically, the {{pipelined_crc32c}} routine optimizes by computing the CRC of up to 3 blocks in parallel. The block size is passed in to {{pipelined_crc32c}} as {{block_size}}. As we can see by looking at one of the other asm blocks in pipelined_crc32c, the core idea is that we maintain {{bdata}} as a pointer to the word being CRCed in the first block, and then use indexed addressing to compute the appropriate address for the word being CRCed in the second (and possibly third) blocks. With all that under our belt, the bug in this code becomes clear: {code} "crc32b (%5), %0;\n\t" "crc32b (%5,%4,1), %1;\n\t" : "=r"(c1), "=r"(c2) : "r"(c1), "r"(c2), "r"(c3), "r"(block_size), "r"(bdata) {code} The first crc32b instruction dereferences %5 which is {{block_size}}, but comparing to any other example of the similar asm block such as: {code} "crc32q (%7), %0;\n\t" "crc32q (%7,%6,1), %1;\n\t" "crc32q (%7,%6,2), %2;\n\t" : "=r"(c1), "=r"(c2), "=r"(c3) : "r"(c1), "r"(c2), "r"(c3), "r"(block_size), "r"(data) {code} it should be dereferencing {{bdata}}. And this is caused because the output constraints list includes {{c3}} even though the input constraints list does not, also different from all other examples of the asm block. Therefore, Colin's fix to remove c3 from the list causes the %4 and %5 references to refer to their intended operands {{block_size}} and {{bdata}} respectively. > libhadoop: native CRC32 validation crashes when io.bytes.per.checksum=1 > ------------------------------------------------------------------------ > > Key: HADOOP-8648 > URL: https://issues.apache.org/jira/browse/HADOOP-8648 > Project: Hadoop Common > Issue Type: Bug > Affects Versions: 2.0.0-alpha > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Attachments: HADOOP-8648.001.patch, HADOOP-8648.002.patch, > HADOOP-8648.003.patch > > > The native CRC32 code, found in {{pipelined_crc32c}}, crashes when blocksize > is set to 1. > {code} > 12:27:14,886 INFO NativeCodeLoader:50 - Loaded the native-hadoop library > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007fa00ee5a340, pid=24100, tid=140326058854144 > # > # JRE version: 6.0_29-b11 > # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.4-b02 mixed mode linux-amd64 > compressed oops) > # Problematic frame: > # C [libhadoop.so.1.0.0+0x8340] pipelined_crc32c+0xa0 > # > # An error report file with more information is saved as: > # /h/hs_err_pid24100.log > # > # If you would like to submit a bug report, please visit: > # http://java.sun.com/webapps/bugreport/crash.jsp > # > Aborted > {code} > The Java CRC code works fine in this case. > Choosing blocksize=1 is a __very__ odd choice. It means that we're storing a > 4-byte checksum for every byte. > {code} > -rw-r--r-- 1 cmccabe users 49398 Aug 3 11:33 blk_4702510289566780538 > -rw-r--r-- 1 cmccabe users 197599 Aug 3 11:33 > blk_4702510289566780538_1199.meta > {code} > However, obviously crashing is never the right thing to do. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira