Yicong Cai created HDFS-14311:
---------------------------------

             Summary: multi-threading conflict at layoutVersion when loading 
block pool storage
                 Key: HDFS-14311
                 URL: https://issues.apache.org/jira/browse/HDFS-14311
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: rolling upgrades
    Affects Versions: 2.9.2
            Reporter: Yicong Cai


When DataNode upgrade from 2.7.3 to 2.9.2, there is a conflict at 
StorageInfo.layoutVersion in loading block pool storage process.

It will cause this exception:

 
{panel:title=exceptions}
2019-02-15 10:18:01,357 [13783] - INFO [Thread-33:BlockPoolSliceStorage@395] - 
Restored 36974 block files from trash before the layout upgrade. These blocks 
will be moved to the previous directory during the upgrade
2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:BlockPoolSliceStorage@226] - 
Failed to analyze storage directories for block pool 
BP-1216718839-10.120.232.23-1548736842023
java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the 
namespace state: LV = -63 CTime = 0
 at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
 at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
 at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
 at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
 at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
 at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
 at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
 at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
 at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
 at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
 at java.lang.Thread.run(Thread.java:748)
2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:DataStorage@472] - Failed to 
add storage directory [DISK]file:/mnt/dfs/2/hadoop/hdfs/data/ for block pool 
BP-1216718839-10.120.232.23-1548736842023
java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the 
namespace state: LV = -63 CTime = 0
 at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
 at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
 at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
 at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
 at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
 at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
 at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
 at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
 at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
 at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
 at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
 at java.lang.Thread.run(Thread.java:748) 
{panel}
 

root cause:

BlockPoolSliceStorage instance is shared for all storage locations recover 
transition. In BlockPoolSliceStorage.doTransition, it will read the old 
layoutVersion from local storage, compare with current DataNode version, then 
do upgrade. In doUpgrade, add the transition work as a sub-thread, the 
transition work will set the BlockPoolSliceStorage's layoutVersion to current 
DN version. The next storage dir transition check will concurrent with pre 
storage dir real transition work, then the BlockPoolSliceStorage instance 
layoutVersion will confusion.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to