[ https://issues.apache.org/jira/browse/HDFS-10989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Wang updated HDFS-10989: ------------------------------- Affects Version/s: 2.4.1 > Cannot get last block length after namenode failover > ---------------------------------------------------- > > Key: HDFS-10989 > URL: https://issues.apache.org/jira/browse/HDFS-10989 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.4.1 > Reporter: zhouyingchao > > On a 2.4 cluster, access to a file failed since the last block length cannot > be gotten. The fsck output of the file at the moment of failure was like > this: > /user/XXXXXXXXX 483600487 bytes, 2 block(s), OPENFORWRITE: MISSING 1 blocks > of total size 215165031 B > 0. BP-219149063-10.108.84.25-1446859315800:blk_2102504098_1035525341 > len=268435456 repl=3 [10.112.17.43:11402, 10.118.22.46:11402, > 10.118.22.49:11402] > 1. > BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036219054{blockUCState=UNDER_RECOVERY, > primaryNodeIndex=2, > replicas=[ReplicaUnderConstruction[[DISK]DS-60be75ad-e4a7-4b1e-b3aa-327c85331d42:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-184a1ce9-655a-4e67-b0cc-29ab9984bd0a:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-6d037ac8-4bcc-4cdc-a803-55b1817e0200:NORMAL|RBW]]} > len=215165031 MISSING! Recorded locations [10.114.10.14:11402, > 10.118.29.3:11402, 10.118.22.42:11402] > From those three data nodes, we found that there were IOException related to > the block and there were pipeline recreating events. > We figured out that there was a namenode failover event before the issue > happened, and there were some updatePipeline calls to the earlier active > namenode: > 2016-09-27,15:04:36,437 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > updatePipeline(block=BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036137092, > newGenerationStamp=1036170430, newLength=2624000, > newNodes=[10.118.22.42:11402, 10.118.22.49:11402, 10.118.24.3:11402], > clientName=DFSClient_NONMAPREDUCE_-442153643_1) > 2016-09-27,15:04:36,438 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > updatePipeline(BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036137092) > successfully to > BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036170430 > 2016-09-27,15:10:10,596 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > updatePipeline(block=BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036170430, > newGenerationStamp=1036219054, newLength=17138265, > newNodes=[10.118.22.49:11402, 10.118.24.3:11402, 10.114.6.45:11402], > clientName=DFSClient_NONMAPREDUCE_-442153643_1) > 2016-09-27,15:10:10,601 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: > updatePipeline(BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036170430) > successfully to > BP-219149063-10.108.84.25-1446859315800:blk_2103114087_1036219054 > Whereas these new data nodes did not show up in the fsck output. It looks > like that when data node recovers pipeline (PIPELINE_SETUP_STREAMING_RECOVERY > ), the new data nodes would not call notifyNamingnodeReceivingBlock for the > transfered block. > From code review, the issue also exists in more recent branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org