[jira] [Updated] (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-1057: --- Attachment: HDFS-1057.20-security.1.patch Patch for 20-security branch uploaded. Concurrent readers hit ChecksumExceptions if following a writer to very end of file --- Key: HDFS-1057 URL: https://issues.apache.org/jira/browse/HDFS-1057 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Affects Versions: 0.20-append, 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: sam rash Priority: Blocker Fix For: 0.20-append, 0.21.0, 0.22.0 Attachments: HDFS-1057-0.20-append.patch, HDFS-1057.20-security.1.patch, conurrent-reader-patch-1.txt, conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt, hdfs-1057-trunk-4.txt, hdfs-1057-trunk-5.txt, hdfs-1057-trunk-6.txt In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-1057: -- Fix Version/s: 0.20.205.0 +1 for the patch. I committed it to 0.20-security. Concurrent readers hit ChecksumExceptions if following a writer to very end of file --- Key: HDFS-1057 URL: https://issues.apache.org/jira/browse/HDFS-1057 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Affects Versions: 0.20-append, 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: sam rash Priority: Blocker Fix For: 0.20-append, 0.20.205.0, 0.21.0, 0.22.0 Attachments: HDFS-1057-0.20-append.patch, HDFS-1057.20-security.1.patch, conurrent-reader-patch-1.txt, conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt, hdfs-1057-trunk-4.txt, hdfs-1057-trunk-5.txt, hdfs-1057-trunk-6.txt In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HDFS-1057: --- Status: Resolved (was: Patch Available) Resolution: Fixed I have committed it to 0.20-append branch as well. Thanks Sam! Concurrent readers hit ChecksumExceptions if following a writer to very end of file --- Key: HDFS-1057 URL: https://issues.apache.org/jira/browse/HDFS-1057 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Affects Versions: 0.20-append, 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: sam rash Priority: Blocker Fix For: 0.20-append, 0.21.0, 0.22.0 Attachments: conurrent-reader-patch-1.txt, conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, HDFS-1057-0.20-append.patch, hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt, hdfs-1057-trunk-4.txt, hdfs-1057-trunk-5.txt, hdfs-1057-trunk-6.txt In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1057: Fix Version/s: 0.21.0 0.22.0 I've committed the trunk change to 0.21 and trunk. Thanks Sam! Concurrent readers hit ChecksumExceptions if following a writer to very end of file --- Key: HDFS-1057 URL: https://issues.apache.org/jira/browse/HDFS-1057 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Affects Versions: 0.20-append, 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: sam rash Priority: Blocker Fix For: 0.20-append, 0.21.0, 0.22.0 Attachments: conurrent-reader-patch-1.txt, conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, HDFS-1057-0.20-append.patch, hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt, hdfs-1057-trunk-4.txt, hdfs-1057-trunk-5.txt, hdfs-1057-trunk-6.txt In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash updated HDFS-1057: --- Attachment: hdfs-1057-trunk-5.txt -returns 0 length only if all DNs are missing the replica (any other io exception will cause client to get exception and it can retry) -my diff viewer does not show any whitespace or indentation changes, but please advise if you see any Concurrent readers hit ChecksumExceptions if following a writer to very end of file --- Key: HDFS-1057 URL: https://issues.apache.org/jira/browse/HDFS-1057 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Affects Versions: 0.20-append, 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: sam rash Priority: Blocker Fix For: 0.20-append Attachments: conurrent-reader-patch-1.txt, conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, HDFS-1057-0.20-append.patch, hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt, hdfs-1057-trunk-4.txt, hdfs-1057-trunk-5.txt In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1057: Status: Patch Available (was: Open) Hadoop Flags: [Reviewed] Concurrent readers hit ChecksumExceptions if following a writer to very end of file --- Key: HDFS-1057 URL: https://issues.apache.org/jira/browse/HDFS-1057 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Affects Versions: 0.20-append, 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: sam rash Priority: Blocker Fix For: 0.20-append Attachments: conurrent-reader-patch-1.txt, conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, HDFS-1057-0.20-append.patch, hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt, hdfs-1057-trunk-4.txt, hdfs-1057-trunk-5.txt In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Spiegelberg updated HDFS-1057: -- Attachment: HDFS-1057-0.20-append.patch fix of HDFS-1057 for 0.20-append branch (courtesy of Todd) Concurrent readers hit ChecksumExceptions if following a writer to very end of file --- Key: HDFS-1057 URL: https://issues.apache.org/jira/browse/HDFS-1057 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Affects Versions: 0.20-append, 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: sam rash Priority: Blocker Fix For: 0.20-append Attachments: conurrent-reader-patch-1.txt, conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, HDFS-1057-0.20-append.patch, hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt, hdfs-1057-trunk-4.txt In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash updated HDFS-1057: --- Attachment: hdfs-1057-trunk-4.txt includes requested changes by hairong. also handles immediate reading of new files by translating a ReplicaNotFoundException into a 0-length block within DFSInputStream for under construction files Concurrent readers hit ChecksumExceptions if following a writer to very end of file --- Key: HDFS-1057 URL: https://issues.apache.org/jira/browse/HDFS-1057 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Affects Versions: 0.20-append, 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: sam rash Priority: Blocker Fix For: 0.20-append Attachments: conurrent-reader-patch-1.txt, conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt, hdfs-1057-trunk-4.txt In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash updated HDFS-1057: --- Attachment: hdfs-1057-trunk-3.txt 1. endOffset is either bytesOnDisk or the chunkChecksum.getDataLength() 2. if tmpLen = endOffset this is a write in progress, use the in--memory checksum (else this is a finalized block not ending on a chunk bondary) 3. fixed up whitespace Concurrent readers hit ChecksumExceptions if following a writer to very end of file --- Key: HDFS-1057 URL: https://issues.apache.org/jira/browse/HDFS-1057 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Affects Versions: 0.21.0, 0.22.0, 0.20-append Reporter: Todd Lipcon Assignee: sam rash Priority: Blocker Attachments: conurrent-reader-patch-1.txt, conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash updated HDFS-1057: --- Attachment: hdfs-1057-trunk-2.txt 1. new endOffset calc includes determining if in-memory checksum is needed 2. added methods to RBW only to set/get last checksum and data length -track this dataLength separate as setBytesOnDisk may be called independently and make the length/byte[] not match (in theory bytes on disk *could* be set to more and we still want a checksum + the corresponding length kept) 3. appropriate changes around waiting for start + length did not remove all replicaVisibleLength uses yet--want to clarify what to replace them with in pre-existing code. Concurrent readers hit ChecksumExceptions if following a writer to very end of file --- Key: HDFS-1057 URL: https://issues.apache.org/jira/browse/HDFS-1057 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Affects Versions: 0.21.0, 0.22.0, 0.20-append Reporter: Todd Lipcon Assignee: sam rash Priority: Blocker Attachments: conurrent-reader-patch-1.txt, conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Spiegelberg updated HDFS-1057: -- Affects Version/s: 0.20-append This should be pulled into the branch-0.20-append branch. Concurrent readers hit ChecksumExceptions if following a writer to very end of file --- Key: HDFS-1057 URL: https://issues.apache.org/jira/browse/HDFS-1057 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Affects Versions: 0.21.0, 0.22.0, 0.20-append Reporter: Todd Lipcon Assignee: sam rash Priority: Blocker Attachments: conurrent-reader-patch-1.txt, conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, hdfs-1057-trunk-1.txt In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash updated HDFS-1057: --- Attachment: hdfs-1057-trunk-1.txt ported patch to trunk (hairong's idea of storing last checksum) Concurrent readers hit ChecksumExceptions if following a writer to very end of file --- Key: HDFS-1057 URL: https://issues.apache.org/jira/browse/HDFS-1057 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Affects Versions: 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: sam rash Priority: Blocker Attachments: conurrent-reader-patch-1.txt, conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, hdfs-1057-trunk-1.txt In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash updated HDFS-1057: --- Attachment: conurrent-reader-patch-3.txt todd pointed out i was missing an essential hunk in FSNameSystem--added it back in Concurrent readers hit ChecksumExceptions if following a writer to very end of file --- Key: HDFS-1057 URL: https://issues.apache.org/jira/browse/HDFS-1057 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Affects Versions: 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: sam rash Priority: Blocker Attachments: conurrent-reader-patch-1.txt, conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash updated HDFS-1057: --- Attachment: conurrent-reader-patch-2.txt -includes all patches around concurrent reader crc problems for the 0.20 port -fixed so patch should include full unit tests (previously depended on other commit not included) Concurrent readers hit ChecksumExceptions if following a writer to very end of file --- Key: HDFS-1057 URL: https://issues.apache.org/jira/browse/HDFS-1057 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Affects Versions: 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: sam rash Priority: Blocker Attachments: conurrent-reader-patch-1.txt, conurrent-reader-patch-2.txt In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash updated HDFS-1057: --- Attachment: (was: conurrent-reader-patch-1.txt) Concurrent readers hit ChecksumExceptions if following a writer to very end of file --- Key: HDFS-1057 URL: https://issues.apache.org/jira/browse/HDFS-1057 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Affects Versions: 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: sam rash Priority: Blocker Attachments: conurrent-reader-patch-1.txt In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam rash updated HDFS-1057: --- Attachment: conurrent-reader-patch-1.txt based on hadoop root dir Concurrent readers hit ChecksumExceptions if following a writer to very end of file --- Key: HDFS-1057 URL: https://issues.apache.org/jira/browse/HDFS-1057 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Affects Versions: 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: sam rash Priority: Blocker Attachments: conurrent-reader-patch-1.txt In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file
[ https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1057: -- Description: In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable. (was: In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs.) Summary: Concurrent readers hit ChecksumExceptions if following a writer to very end of file (was: BlockReceiver records block length in replicaInfo before flushing) This problem is worse than originally reported. Switching the order of flush and setBytesOnDisk doesn't solve the problem, because the last checksum in the meta file is still changing. So, since we don't access the data synchronously with the checksum, a client trying to read the last several bytes of a file under construction will get checksum errors. Solving this is likely to be very tricky... Concurrent readers hit ChecksumExceptions if following a writer to very end of file --- Key: HDFS-1057 URL: https://issues.apache.org/jira/browse/HDFS-1057 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.21.0, 0.22.0 Reporter: Todd Lipcon Priority: Critical In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before calling flush(). Therefore, if there is a concurrent reader, it's possible to race here - the reader will see the new length while those bytes are still in the buffers of BlockReceiver. Thus the client will potentially see checksum errors or EOFs. Additionally, the last checksum chunk of the file is made accessible to readers even though it is not stable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.