[jira] Commented: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file

sam rash (JIRA) Wed, 16 Jun 2010 12:02:50 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879460#action_12879460
 ]


sam rash commented on HDFS-1057:
--------------------------------

1. they aren't guaranteed to be since there are methods to change the 
bytesOnDisk separate from the lastCheckSum bytes.  It's entirely conceivable 
that something could update the bytes on disk w/o updating the lastChecksum 
with the current set of methods

If we are ok with a loosely coupled guarantee, then we can use bytesOnDisk and 
be careful never to call setBytesOnDisk() for any RBW

2. oh--your previous comments indicated we shouldn't change either 
ReplicaInPipelineInterface or ReplicaInPipeline.  If that's not the case and we 
can do this, then my comment above doesn't hold.  we use bytesOnDisk and 
guarantee it's in sync with the checksum in a single synchronized method (I 
like this)

3. will make the update to treat missing last blocks as 0-length and re-instate 
the unit test.

thanks for all the help on this

> Concurrent readers hit ChecksumExceptions if following a writer to very end 
> of file
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-1057
>                 URL: https://issues.apache.org/jira/browse/HDFS-1057
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: data-node
>    Affects Versions: 0.20-append, 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: sam rash
>            Priority: Blocker
>         Attachments: conurrent-reader-patch-1.txt, 
> conurrent-reader-patch-2.txt, conurrent-reader-patch-3.txt, 
> hdfs-1057-trunk-1.txt, hdfs-1057-trunk-2.txt, hdfs-1057-trunk-3.txt
>
>
> In BlockReceiver.receivePacket, it calls replicaInfo.setBytesOnDisk before 
> calling flush(). Therefore, if there is a concurrent reader, it's possible to 
> race here - the reader will see the new length while those bytes are still in 
> the buffers of BlockReceiver. Thus the client will potentially see checksum 
> errors or EOFs. Additionally, the last checksum chunk of the file is made 
> accessible to readers even though it is not stable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-1057) Concurrent readers hit ChecksumExceptions if following a writer to very end of file

Reply via email to