[ 
https://issues.apache.org/jira/browse/HDFS-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063360#comment-14063360
 ] 

Hudson commented on HDFS-5809:
------------------------------

FAILURE: Integrated in Hadoop-Yarn-trunk #614 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/614/])
HDFS-5809. BlockPoolSliceScanner and high speed hdfs appending make datanode to 
drop into infinite loop (cmccabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1610790)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java


> BlockPoolSliceScanner and high speed hdfs appending make datanode to drop 
> into infinite loop
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-5809
>                 URL: https://issues.apache.org/jira/browse/HDFS-5809
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.0.0-alpha
>         Environment: jdk1.6, centos6.4, 2.0.0-cdh4.5.0
>            Reporter: ikweesung
>            Assignee: Colin Patrick McCabe
>            Priority: Critical
>              Labels: blockpoolslicescanner, datanode, infinite-loop
>             Fix For: 2.6.0
>
>         Attachments: HDFS-5809.001.patch
>
>
> {{BlockPoolSliceScanner#scan}} contains a "while" loop that continues to 
> verify (i.e. scan) blocks until the {{blockInfoSet}} is empty (or some other 
> conditions like a timeout have occurred.)  In order to do this, it calls 
> {{BlockPoolSliceScanner#verifyFirstBlock}}.  This is intended to grab the 
> first block in the {{blockInfoSet}}, verify it, and remove it from that set.  
> ({{blockInfoSet}} is sorted by last scan time.) Unfortunately, if we hit a 
> certain bug in {{updateScanStatus}}, the block may never be removed from 
> {{blockInfoSet}}.  When this happens, we keep rescanning the exact same block 
> until the timeout hits.
> The bug is triggered when a block winds up in {{blockInfoSet}} but not in 
> {{blockMap}}.  You can see it clearly in this code:
> {code}
>   private synchronized void updateScanStatus(Block block,                     
>  
>                                              ScanType type,
>                                              boolean scanOk) {                
>  
>     BlockScanInfo info = blockMap.get(block);
>                                                                               
>  
>     if ( info != null ) {
>       delBlockInfo(info);
>     } else {                                                                  
>  
>       // It might already be removed. Thats ok, it will be caught next time.  
>  
>       info = new BlockScanInfo(block);                                        
>  
>     }   
> {code}
> If {{info == null}}, we never call {{delBlockInfo}}, the function which is 
> intended to remove the {{blockInfoSet}} entry.
> Luckily, there is a simple fix here... the variable that {{updateScanStatus}} 
> is being passed is actually a BlockInfo object, so we can simply call 
> {{delBlockInfo}} on it directly, without doing a lookup in the {{blockMap}}.  
> This is both faster and more robust.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to