[ 
https://issues.apache.org/jira/browse/HDFS-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-8763:
-------------------------------
    Description: 
- For our cluster, the NameNode is always very busy, so for every incremental 
block report , the contention of lock is heavy.
- The logic of incremental block report is as follow, client send block to dn1 
and dn1 mirror to dn2, dn2 mirror to dn3. After finish this block, all datanode 
will report the newly received block to namenode. In NameNode side, all will go 
to the method processIncrementalBlockReport in BlockManager class. But the 
status of the block reported from dn2,dn3 is RECEIVING_BLOCK, for dn1 is 
RECEIED_BLOCK. It is okay if dn2, dn3 report before dn1(that is common), but in 
some busy environment, it is easy to find dn1 report before dn2 or dn3, let’s 
assume dn2 report first, dn1 report second, and dn3 report third.
- So dn1 will addStoredBlock and find the replica of this block is not reach 
the the original number(which is 3), and the block will add to 
neededReplications construction and soon ask some node in pipeline (dn1 or 
dn2)to replica it dn4 . After sometime, dn4 and dn3 all report this block, then 
choose one node to invalidate.
Here is one log i found in our cluster:
{noformat}
2015-07-08 01:05:34,675 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
allocateBlock: 
/logs/***_bigdata_spam/logs/application_1435099124107_470749/xx.xx.4.62_45454.tmp.
 BP-1386326728-xx.xx.2.131-1382089338395 
blk_3194502674_2121080184{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[[DISK]DS-a7c0f8f6-2399-4980-9479-efa08487b7b3:NORMAL|RBW],
 
ReplicaUnderConstruction[[DISK]DS-c75145a0-ed63-4180-87ee-d48ccaa647c5:NORMAL|RBW],
 
ReplicaUnderConstruction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW]]}
2015-07-08 01:05:34,689 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap 
updated: xx.xx.7.75:50010 is added to 
blk_3194502674_2121080184{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW],
 
ReplicaUnderConstruction[[DISK]DS-74ed264f-da43-4cc3-9fa9-164ba99f752a:NORMAL|RBW],
 
ReplicaUnderConstruction[[DISK]DS-56121ce1-8991-45b3-95bc-2a5357991512:NORMAL|RBW]]}
 size 0
2015-07-08 01:05:34,689 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap 
updated: xx.xx.4.62:50010 is added to 
blk_3194502674_2121080184{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW],
 
ReplicaUnderConstruction[[DISK]DS-74ed264f-da43-4cc3-9fa9-164ba99f752a:NORMAL|RBW],
 
ReplicaUnderConstruction[[DISK]DS-56121ce1-8991-45b3-95bc-2a5357991512:NORMAL|RBW]]}
 size 0
2015-07-08 01:05:35,003 INFO BlockStateChange: BLOCK* ask xx.xx.4.62:50010 to 
replicate blk_3194502674_2121080184 to datanode(s) xx.xx.4.65:50010
2015-07-08 01:05:35,403 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap 
updated: xx.xx.7.73:50010 is added to blk_3194502674_2121080184 size 67750
2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap 
updated: xx.xx.4.65:50010 is added to blk_3194502674_2121080184 size 67750
2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* InvalidateBlocks: add 
blk_3194502674_2121080184 to xx.xx.7.75:50010
2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* chooseExcessReplicates: 
(xx.xx.7.75:50010, blk_3194502674_2121080184) is added to invalidated blocks set
2015-07-08 01:05:35,852 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
InvalidateBlocks: ask xx.xx.7.75:50010 to delete [blk_3194502674_2121080184, 
blk_3194497594_2121075104]
{noformat}
Some day, the number of this situation can be 400000, that is not good for the 
performance and waste network band.
Our base version is hadoop 2.4 and i have checked hadoop 2.7.1 didn’t find any 
difference.

  was:
-For our cluster, the NameNode is always very busy, so for every incremental 
block report , the contention of lock is heavy.-
-The logic of incremental block report is as follow, client send block to dn1 
and dn1 mirror to dn2, dn2 mirror to dn3. After finish this block, all datanode 
will report the newly received block to namenode. In NameNode side, all will go 
to the method processIncrementalBlockReport in BlockManager class. But the 
status of the block reported from dn2,dn3 is RECEIVING_BLOCK, for dn1 is 
RECEIED_BLOCK. It is okay if dn2, dn3 report before dn1(that is common), but in 
some busy environment, it is easy to find dn1 report before dn2 or dn3, let’s 
assume dn2 report first, dn1 report second, and dn3 report third.-
-So dn1 will addStoredBlock and find the replica of this block is not reach the 
the original number(which is 3), and the block will add to neededReplications 
construction and soon ask some node in pipeline (dn1 or dn2)to replica it dn4 . 
After sometime, dn4 and dn3 all report this block, then choose one node to 
invalidate.-
Here is one log i found in our cluster:
{noformat}
2015-07-08 01:05:34,675 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
allocateBlock: 
/logs/***_bigdata_spam/logs/application_1435099124107_470749/xx.xx.4.62_45454.tmp.
 BP-1386326728-xx.xx.2.131-1382089338395 
blk_3194502674_2121080184{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[[DISK]DS-a7c0f8f6-2399-4980-9479-efa08487b7b3:NORMAL|RBW],
 
ReplicaUnderConstruction[[DISK]DS-c75145a0-ed63-4180-87ee-d48ccaa647c5:NORMAL|RBW],
 
ReplicaUnderConstruction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW]]}
2015-07-08 01:05:34,689 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap 
updated: xx.xx.7.75:50010 is added to 
blk_3194502674_2121080184{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW],
 
ReplicaUnderConstruction[[DISK]DS-74ed264f-da43-4cc3-9fa9-164ba99f752a:NORMAL|RBW],
 
ReplicaUnderConstruction[[DISK]DS-56121ce1-8991-45b3-95bc-2a5357991512:NORMAL|RBW]]}
 size 0
2015-07-08 01:05:34,689 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap 
updated: xx.xx.4.62:50010 is added to 
blk_3194502674_2121080184{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW],
 
ReplicaUnderConstruction[[DISK]DS-74ed264f-da43-4cc3-9fa9-164ba99f752a:NORMAL|RBW],
 
ReplicaUnderConstruction[[DISK]DS-56121ce1-8991-45b3-95bc-2a5357991512:NORMAL|RBW]]}
 size 0
2015-07-08 01:05:35,003 INFO BlockStateChange: BLOCK* ask xx.xx.4.62:50010 to 
replicate blk_3194502674_2121080184 to datanode(s) xx.xx.4.65:50010
2015-07-08 01:05:35,403 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap 
updated: xx.xx.7.73:50010 is added to blk_3194502674_2121080184 size 67750
2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap 
updated: xx.xx.4.65:50010 is added to blk_3194502674_2121080184 size 67750
2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* InvalidateBlocks: add 
blk_3194502674_2121080184 to xx.xx.7.75:50010
2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* chooseExcessReplicates: 
(xx.xx.7.75:50010, blk_3194502674_2121080184) is added to invalidated blocks set
2015-07-08 01:05:35,852 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
InvalidateBlocks: ask xx.xx.7.75:50010 to delete [blk_3194502674_2121080184, 
blk_3194497594_2121075104]
{noformat}
Some day, the number of this situation can be 400000, that is not good for the 
performance and waste network band.
Our base version is hadoop 2.4 and i have checked hadoop 2.7.1 didn’t find any 
difference.


> After file closed, a race condition between IBR of 3rd replica of lastBlock 
> and ReplicationMonitor
> --------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-8763
>                 URL: https://issues.apache.org/jira/browse/HDFS-8763
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: jiangyu
>            Assignee: Walter Su
>            Priority: Minor
>         Attachments: HDFS-8763.01.patch, HDFS-8763.02.patch
>
>
> - For our cluster, the NameNode is always very busy, so for every incremental 
> block report , the contention of lock is heavy.
> - The logic of incremental block report is as follow, client send block to 
> dn1 and dn1 mirror to dn2, dn2 mirror to dn3. After finish this block, all 
> datanode will report the newly received block to namenode. In NameNode side, 
> all will go to the method processIncrementalBlockReport in BlockManager 
> class. But the status of the block reported from dn2,dn3 is RECEIVING_BLOCK, 
> for dn1 is RECEIED_BLOCK. It is okay if dn2, dn3 report before dn1(that is 
> common), but in some busy environment, it is easy to find dn1 report before 
> dn2 or dn3, let’s assume dn2 report first, dn1 report second, and dn3 report 
> third.
> - So dn1 will addStoredBlock and find the replica of this block is not reach 
> the the original number(which is 3), and the block will add to 
> neededReplications construction and soon ask some node in pipeline (dn1 or 
> dn2)to replica it dn4 . After sometime, dn4 and dn3 all report this block, 
> then choose one node to invalidate.
> Here is one log i found in our cluster:
> {noformat}
> 2015-07-08 01:05:34,675 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> allocateBlock: 
> /logs/***_bigdata_spam/logs/application_1435099124107_470749/xx.xx.4.62_45454.tmp.
>  BP-1386326728-xx.xx.2.131-1382089338395 
> blk_3194502674_2121080184{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-a7c0f8f6-2399-4980-9479-efa08487b7b3:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-c75145a0-ed63-4180-87ee-d48ccaa647c5:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW]]}
> 2015-07-08 01:05:34,689 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: xx.xx.7.75:50010 is added to 
> blk_3194502674_2121080184{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-74ed264f-da43-4cc3-9fa9-164ba99f752a:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-56121ce1-8991-45b3-95bc-2a5357991512:NORMAL|RBW]]}
>  size 0
> 2015-07-08 01:05:34,689 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: xx.xx.4.62:50010 is added to 
> blk_3194502674_2121080184{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[[DISK]DS-15a4dc8e-5b7d-449f-a941-6dced45e6f07:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-74ed264f-da43-4cc3-9fa9-164ba99f752a:NORMAL|RBW],
>  
> ReplicaUnderConstruction[[DISK]DS-56121ce1-8991-45b3-95bc-2a5357991512:NORMAL|RBW]]}
>  size 0
> 2015-07-08 01:05:35,003 INFO BlockStateChange: BLOCK* ask xx.xx.4.62:50010 to 
> replicate blk_3194502674_2121080184 to datanode(s) xx.xx.4.65:50010
> 2015-07-08 01:05:35,403 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: xx.xx.7.73:50010 is added to blk_3194502674_2121080184 size 
> 67750
> 2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* addStoredBlock: 
> blockMap updated: xx.xx.4.65:50010 is added to blk_3194502674_2121080184 size 
> 67750
> 2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* InvalidateBlocks: add 
> blk_3194502674_2121080184 to xx.xx.7.75:50010
> 2015-07-08 01:05:35,833 INFO BlockStateChange: BLOCK* chooseExcessReplicates: 
> (xx.xx.7.75:50010, blk_3194502674_2121080184) is added to invalidated blocks 
> set
> 2015-07-08 01:05:35,852 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> InvalidateBlocks: ask xx.xx.7.75:50010 to delete [blk_3194502674_2121080184, 
> blk_3194497594_2121075104]
> {noformat}
> Some day, the number of this situation can be 400000, that is not good for 
> the performance and waste network band.
> Our base version is hadoop 2.4 and i have checked hadoop 2.7.1 didn’t find 
> any difference.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to