[
https://issues.apache.org/jira/browse/HADOOP-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442761#comment-13442761
]
Andy Isaacson commented on HADOOP-8648:
---------------------------------------
I've reviewed the patch closely and agree that it's right. The only tiny
improvement I'd make is to fix this misleading comment in the 64-bit version of
{{bulk_crc32.c}}:
{code}
374 int remainder = block_size % sizeof(uint64_t);
...
401 /* Take care of the remainder. They are only up to three bytes,
402 * so performing byte-level crc32 won't take much time.
403 */
404 bdata = (uint8_t*)data;
405 while (likely(remainder)) {
{code}
The comment says "up to three bytes" but since this is a uint64_t at a time, it
should say "up to seven bytes". This came from a copy-and-paste from the
32-bit version.
Ideally we could refactor the 32-bit and 64-bit versions to one using a
{{#define WORD_T uint32_t}} or similar, but let's do that in a followup jira.
I've also reviewed the original patch in HADOOP-7446 and confirmed that there
weren't any other similar bugs added.
Note that the erroneous asm is only hit if the HDFS blocksize is not a multiple
of the wordsize, which AFAICS can only happen for blocksize<7. And, the bug
lurked because the remainder codepath didn't have any tests, so thanks for
adding those.
Overall, LGTM. Fix the comment if you choose, else ship it.
> libhadoop: native CRC32 validation crashes when io.bytes.per.checksum=1
> ------------------------------------------------------------------------
>
> Key: HADOOP-8648
> URL: https://issues.apache.org/jira/browse/HADOOP-8648
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 2.0.0-alpha
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HADOOP-8648.001.patch, HADOOP-8648.002.patch,
> HADOOP-8648.003.patch
>
>
> The native CRC32 code, found in {{pipelined_crc32c}}, crashes when blocksize
> is set to 1.
> {code}
> 12:27:14,886 INFO NativeCodeLoader:50 - Loaded the native-hadoop library
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # SIGSEGV (0xb) at pc=0x00007fa00ee5a340, pid=24100, tid=140326058854144
> #
> # JRE version: 6.0_29-b11
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (20.4-b02 mixed mode linux-amd64
> compressed oops)
> # Problematic frame:
> # C [libhadoop.so.1.0.0+0x8340] pipelined_crc32c+0xa0
> #
> # An error report file with more information is saved as:
> # /h/hs_err_pid24100.log
> #
> # If you would like to submit a bug report, please visit:
> # http://java.sun.com/webapps/bugreport/crash.jsp
> #
> Aborted
> {code}
> The Java CRC code works fine in this case.
> Choosing blocksize=1 is a __very__ odd choice. It means that we're storing a
> 4-byte checksum for every byte.
> {code}
> -rw-r--r-- 1 cmccabe users 49398 Aug 3 11:33 blk_4702510289566780538
> -rw-r--r-- 1 cmccabe users 197599 Aug 3 11:33
> blk_4702510289566780538_1199.meta
> {code}
> However, obviously crashing is never the right thing to do.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira