[ 
https://issues.apache.org/jira/browse/HADOOP-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442756#comment-13442756
 ] 

Andy Isaacson commented on HADOOP-8648:
---------------------------------------

I didn't understand why this code was wrong before, so I looked into it in more 
depth and I agree with Colin's analysis and patch.  In the interest of making 
this easier for others to understand, here are a few references.

http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html explains the 
GCC inline assembly syntax, and in particular how the {{asm("some assembly" : 
inputconstraints : outputconstraints : clobbers)}} syntax is parsed, and how 
the constraints map to the {{%n}} in the assembly string.

http://asm.sourceforge.net/articles/rmiyagi-inline-asm.txt describes the x86 
indexed addressing modes, in particular explaining how {{(%5,%4,1)}} is 
interpreted as "the word of memory at {{%5 + 1 * %4}}".

http://softwarecommunity.intel.com/userfiles/en-us/d9156103.pdf describes the 
details of the SSE4 CRC32 instruction in mind-numbing detail, but that's not 
especially relevant to this bug.  All we need to know is that {{crc32}}_size_ 
operates on 8, 32, or 64 bits depending on _size_, and its first argument is 
read-only while its second argument is used as an accumulator (read, modify, 
write).

Finally, the comments in bulk_crc32.c are very helpful.  Critically, the 
{{pipelined_crc32c}} routine optimizes by computing the CRC of up to 3 blocks 
in parallel.  The block size is passed in to {{pipelined_crc32c}} as 
{{block_size}}.  As we can see by looking at one of the other asm blocks in 
pipelined_crc32c, the core idea is that we maintain {{bdata}} as a pointer to 
the word being CRCed in the first block, and then use indexed addressing to 
compute the appropriate address for the word being CRCed in the second (and 
possibly third) blocks.

With all that under our belt, the bug in this code becomes clear:
{code}
        "crc32b (%5), %0;\n\t"
        "crc32b (%5,%4,1), %1;\n\t"
         : "=r"(c1), "=r"(c2)
         : "r"(c1), "r"(c2), "r"(c3), "r"(block_size), "r"(bdata)
{code}
The first crc32b instruction dereferences %5 which is {{block_size}}, but 
comparing to any other example of the similar asm block such as:
{code}
        "crc32q (%7), %0;\n\t"
        "crc32q (%7,%6,1), %1;\n\t"
        "crc32q (%7,%6,2), %2;\n\t"
         : "=r"(c1), "=r"(c2), "=r"(c3)
         : "r"(c1), "r"(c2), "r"(c3), "r"(block_size), "r"(data)
{code}
it should be dereferencing {{bdata}}.  And this is caused because the output 
constraints list includes {{c3}} even though the input constraints list does 
not, also different from all other examples of the asm block.

Therefore, Colin's fix to remove c3 from the list causes the %4 and %5 
references to refer to their intended operands {{block_size}} and {{bdata}} 
respectively.
                
> libhadoop:  native CRC32 validation crashes when io.bytes.per.checksum=1
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-8648
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8648
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HADOOP-8648.001.patch, HADOOP-8648.002.patch, 
> HADOOP-8648.003.patch
>
>
> The native CRC32 code, found in {{pipelined_crc32c}}, crashes when blocksize 
> is set to 1.
> {code}
> 12:27:14,886  INFO NativeCodeLoader:50 - Loaded the native-hadoop library
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00007fa00ee5a340, pid=24100, tid=140326058854144
> #
> # JRE version: 6.0_29-b11
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.4-b02 mixed mode linux-amd64 
> compressed oops)
> # Problematic frame:
> # C  [libhadoop.so.1.0.0+0x8340]  pipelined_crc32c+0xa0
> #
> # An error report file with more information is saved as:
> # /h/hs_err_pid24100.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://java.sun.com/webapps/bugreport/crash.jsp
> #
> Aborted
> {code}
> The Java CRC code works fine in this case.
> Choosing blocksize=1 is a __very__ odd choice.  It means that we're storing a 
> 4-byte checksum for every byte. 
> {code}
> -rw-r--r--  1 cmccabe users  49398 Aug  3 11:33 blk_4702510289566780538
> -rw-r--r--  1 cmccabe users 197599 Aug  3 11:33 
> blk_4702510289566780538_1199.meta
> {code}
> However, obviously crashing is never the right thing to do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to