[ 
https://issues.apache.org/jira/browse/HDFS-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249939#comment-13249939
 ] 

Todd Lipcon commented on HDFS-3222:
-----------------------------------

bq. I think our proposal won't work here, because by the time of hsync, DN will 
not report to NN anyway.

On the first hflush() for a block, it calls NN.fsync(), which internally calls 
persistBlocks(). Currently, the fsync call doesn't give a length, but perhaps 
it could?

The other thought is that, after a restart, a block that was previously being 
written would be in the under construction state, but with no expectedTargets. 
This differs from the case where a block has been allocated but not yet written 
to replicas. We could use that to set a new flag in the LocatedBlock response 
indicating that it's not a 0-length, but instead that it's corrupt.

                
> DFSInputStream#openInfo should not silently get the length as 0 when 
> locations length is zero for last partial block.
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-3222
>                 URL: https://issues.apache.org/jira/browse/HDFS-3222
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client
>    Affects Versions: 1.0.3, 2.0.0, 3.0.0
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>         Attachments: HDFS-3222-Test.patch
>
>
> I have seen one situation with Hbase cluster.
> Scenario is as follows:
> 1)1.5 blocks has been written and synced.
> 2)Suddenly cluster has been restarted.
> Reader opened the file and trying to get the length., By this time partial 
> block contained DNs are not reported to NN. So, locations for this partial 
> block would be 0. In this case, DFSInputStream assumes that, 1 block size as 
> final size.
> But reader also assuming that, 1 block size is the final length and setting 
> his end marker. Finally reader ending up reading only partial data. Due to 
> this, HMaster could not replay the complete edits. 
> Actually this happend with 20 version. Looking at the code, same should 
> present in trunk as well.
> {code}
>     int replicaNotFoundCount = locatedblock.getLocations().length;
>     
>     for(DatanodeInfo datanode : locatedblock.getLocations()) {
> ..........
> ..........
>  // Namenode told us about these locations, but none know about the replica
>     // means that we hit the race between pipeline creation start and end.
>     // we require all 3 because some other exception could have happened
>     // on a DN that has it.  we want to report that error
>     if (replicaNotFoundCount == 0) {
>       return 0;
>     }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to