[jira] [Commented] (HDFS-14311) Multi-threading conflict at layoutVersion when loading block pool storage

2019-08-20 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911692#comment-16911692
 ] 

Hudson commented on HDFS-14311:
---

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17152 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17152/])
HDFS-14311. Multi-threading conflict at layoutVersion when loading block 
(weichiu: rev 4cb22cd867a9295efc815dc95525b5c3e5960ea6)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java


> Multi-threading conflict at layoutVersion when loading block pool storage
> -
>
> Key: HDFS-14311
> URL: https://issues.apache.org/jira/browse/HDFS-14311
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rolling upgrades
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Major
> Fix For: 2.10.0, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3
>
> Attachments: HDFS-14311.1.patch, HDFS-14311.2.patch, 
> HDFS-14311.branch-2.1.patch
>
>
> When DataNode upgrade from 2.7.3 to 2.9.2, there is a conflict at 
> StorageInfo.layoutVersion in loading block pool storage process.
> It will cause this exception:
>  
> {panel:title=exceptions}
> 2019-02-15 10:18:01,357 [13783] - INFO [Thread-33:BlockPoolSliceStorage@395] 
> - Restored 36974 block files from trash before the layout upgrade. These 
> blocks will be moved to the previous directory during the upgrade
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:BlockPoolSliceStorage@226] 
> - Failed to analyze storage directories for block pool 
> BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the 
> namespace state: LV = -63 CTime = 0
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:DataStorage@472] - Failed 
> to add storage directory [DISK]file:/mnt/dfs/2/hadoop/hdfs/data/ for block 
> pool BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the 
> namespace state: LV = -63 CTime = 0
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
>  at java.lang.Thread.run(Thread.java:748) 
> {panel}
>  
> 

[jira] [Commented] (HDFS-14311) Multi-threading conflict at layoutVersion when loading block pool storage

2019-08-20 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911582#comment-16911582
 ] 

Wei-Chiu Chuang commented on HDFS-14311:


+1 failed tests doesn't reproduce for me locally. Pushed rev2 patch to trunk 
branch-3.2 branch-3.1

> Multi-threading conflict at layoutVersion when loading block pool storage
> -
>
> Key: HDFS-14311
> URL: https://issues.apache.org/jira/browse/HDFS-14311
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rolling upgrades
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: HDFS-14311.1.patch, HDFS-14311.2.patch, 
> HDFS-14311.branch-2.1.patch
>
>
> When DataNode upgrade from 2.7.3 to 2.9.2, there is a conflict at 
> StorageInfo.layoutVersion in loading block pool storage process.
> It will cause this exception:
>  
> {panel:title=exceptions}
> 2019-02-15 10:18:01,357 [13783] - INFO [Thread-33:BlockPoolSliceStorage@395] 
> - Restored 36974 block files from trash before the layout upgrade. These 
> blocks will be moved to the previous directory during the upgrade
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:BlockPoolSliceStorage@226] 
> - Failed to analyze storage directories for block pool 
> BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the 
> namespace state: LV = -63 CTime = 0
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:DataStorage@472] - Failed 
> to add storage directory [DISK]file:/mnt/dfs/2/hadoop/hdfs/data/ for block 
> pool BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the 
> namespace state: LV = -63 CTime = 0
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
>  at java.lang.Thread.run(Thread.java:748) 
> {panel}
>  
> root cause:
> BlockPoolSliceStorage instance is shared for all storage locations recover 
> transition. In BlockPoolSliceStorage.doTransition, it will read the old 
> layoutVersion from local storage, compare with current DataNode version, then 
> do upgrade. In doUpgrade, 

[jira] [Commented] (HDFS-14311) multi-threading conflict at layoutVersion when loading block pool storage

2019-08-20 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911169#comment-16911169
 ] 

Hadoop QA commented on HDFS-14311:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m  
8s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
 0s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
54s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 59s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}118m 57s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.balancer.TestBalancerRPCDelay |
|   | hadoop.hdfs.server.datanode.TestFsDatasetCache |
|   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys |
|   | hadoop.hdfs.server.namenode.TestFsck |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.7 Server=18.09.7 Image:yetus/hadoop:da675796017 |
| JIRA Issue | HDFS-14311 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12978034/HDFS-14311.branch-2.1.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ed05a807f19b 4.15.0-52-generic 

[jira] [Commented] (HDFS-14311) multi-threading conflict at layoutVersion when loading block pool storage

2019-08-20 Thread Yicong Cai (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911074#comment-16911074
 ] 

Yicong Cai commented on HDFS-14311:
---

Thanks [~sodonnell] [~surendrasingh] [~jojochuang] for your attention and 
review on this issue. 

It is very difficult to use UT to reproduce, I have failed. I first modified 
the check style related issues, I will continue to try to reproduce the problem 
with UT.

> multi-threading conflict at layoutVersion when loading block pool storage
> -
>
> Key: HDFS-14311
> URL: https://issues.apache.org/jira/browse/HDFS-14311
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rolling upgrades
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Major
> Attachments: HDFS-14311.1.patch, HDFS-14311.2.patch, 
> HDFS-14311.branch-2.1.patch
>
>
> When DataNode upgrade from 2.7.3 to 2.9.2, there is a conflict at 
> StorageInfo.layoutVersion in loading block pool storage process.
> It will cause this exception:
>  
> {panel:title=exceptions}
> 2019-02-15 10:18:01,357 [13783] - INFO [Thread-33:BlockPoolSliceStorage@395] 
> - Restored 36974 block files from trash before the layout upgrade. These 
> blocks will be moved to the previous directory during the upgrade
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:BlockPoolSliceStorage@226] 
> - Failed to analyze storage directories for block pool 
> BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the 
> namespace state: LV = -63 CTime = 0
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:DataStorage@472] - Failed 
> to add storage directory [DISK]file:/mnt/dfs/2/hadoop/hdfs/data/ for block 
> pool BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the 
> namespace state: LV = -63 CTime = 0
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
>  at java.lang.Thread.run(Thread.java:748) 
> {panel}
>  
> root cause:
> BlockPoolSliceStorage instance is shared for all storage locations recover 
> transition. In BlockPoolSliceStorage.doTransition, it will read the old 

[jira] [Commented] (HDFS-14311) multi-threading conflict at layoutVersion when loading block pool storage

2019-08-19 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910827#comment-16910827
 ] 

Wei-Chiu Chuang commented on HDFS-14311:


Thanks [~caiyicong] for the bug report and excellent fix. [~sodonnell] thanks 
for your explanation that really makes sense now.
[~surendrasingh] thanks for confirming this fix works for you.

I am +1 to commit this patch.
Additionally, it looks like {{DataStorage#loadDataStorage()}} has the similar 
code structure, and potentially same concurrency bug to me.

> multi-threading conflict at layoutVersion when loading block pool storage
> -
>
> Key: HDFS-14311
> URL: https://issues.apache.org/jira/browse/HDFS-14311
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rolling upgrades
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Major
> Attachments: HDFS-14311.1.patch
>
>
> When DataNode upgrade from 2.7.3 to 2.9.2, there is a conflict at 
> StorageInfo.layoutVersion in loading block pool storage process.
> It will cause this exception:
>  
> {panel:title=exceptions}
> 2019-02-15 10:18:01,357 [13783] - INFO [Thread-33:BlockPoolSliceStorage@395] 
> - Restored 36974 block files from trash before the layout upgrade. These 
> blocks will be moved to the previous directory during the upgrade
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:BlockPoolSliceStorage@226] 
> - Failed to analyze storage directories for block pool 
> BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the 
> namespace state: LV = -63 CTime = 0
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:DataStorage@472] - Failed 
> to add storage directory [DISK]file:/mnt/dfs/2/hadoop/hdfs/data/ for block 
> pool BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the 
> namespace state: LV = -63 CTime = 0
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
>  at java.lang.Thread.run(Thread.java:748) 
> {panel}
>  
> root cause:
> BlockPoolSliceStorage instance is shared for all storage locations recover 
> transition. In 

[jira] [Commented] (HDFS-14311) multi-threading conflict at layoutVersion when loading block pool storage

2019-08-19 Thread Surendra Singh Lilhore (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16910187#comment-16910187
 ] 

Surendra Singh Lilhore commented on HDFS-14311:
---

Thanks [~caiyicong] for reporting this issue.

We got the same issue in our cluster and it got fixed with this patch. I don't 
think it is easy to reproduce in unit test.

> multi-threading conflict at layoutVersion when loading block pool storage
> -
>
> Key: HDFS-14311
> URL: https://issues.apache.org/jira/browse/HDFS-14311
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rolling upgrades
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Major
> Attachments: HDFS-14311.1.patch
>
>
> When DataNode upgrade from 2.7.3 to 2.9.2, there is a conflict at 
> StorageInfo.layoutVersion in loading block pool storage process.
> It will cause this exception:
>  
> {panel:title=exceptions}
> 2019-02-15 10:18:01,357 [13783] - INFO [Thread-33:BlockPoolSliceStorage@395] 
> - Restored 36974 block files from trash before the layout upgrade. These 
> blocks will be moved to the previous directory during the upgrade
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:BlockPoolSliceStorage@226] 
> - Failed to analyze storage directories for block pool 
> BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the 
> namespace state: LV = -63 CTime = 0
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:DataStorage@472] - Failed 
> to add storage directory [DISK]file:/mnt/dfs/2/hadoop/hdfs/data/ for block 
> pool BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the 
> namespace state: LV = -63 CTime = 0
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
>  at java.lang.Thread.run(Thread.java:748) 
> {panel}
>  
> root cause:
> BlockPoolSliceStorage instance is shared for all storage locations recover 
> transition. In BlockPoolSliceStorage.doTransition, it will read the old 
> layoutVersion from local storage, compare with current DataNode version, then 
> do upgrade. In doUpgrade, add the 

[jira] [Commented] (HDFS-14311) multi-threading conflict at layoutVersion when loading block pool storage

2019-08-13 Thread Yicong Cai (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906309#comment-16906309
 ] 

Yicong Cai commented on HDFS-14311:
---

[~sodonnell] Thanks for your detailed reply. I will add the corresponding 
question replication use cases and adjust the code format.

> multi-threading conflict at layoutVersion when loading block pool storage
> -
>
> Key: HDFS-14311
> URL: https://issues.apache.org/jira/browse/HDFS-14311
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rolling upgrades
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Major
> Attachments: HDFS-14311.1.patch
>
>
> When DataNode upgrade from 2.7.3 to 2.9.2, there is a conflict at 
> StorageInfo.layoutVersion in loading block pool storage process.
> It will cause this exception:
>  
> {panel:title=exceptions}
> 2019-02-15 10:18:01,357 [13783] - INFO [Thread-33:BlockPoolSliceStorage@395] 
> - Restored 36974 block files from trash before the layout upgrade. These 
> blocks will be moved to the previous directory during the upgrade
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:BlockPoolSliceStorage@226] 
> - Failed to analyze storage directories for block pool 
> BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the 
> namespace state: LV = -63 CTime = 0
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:DataStorage@472] - Failed 
> to add storage directory [DISK]file:/mnt/dfs/2/hadoop/hdfs/data/ for block 
> pool BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the 
> namespace state: LV = -63 CTime = 0
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
>  at java.lang.Thread.run(Thread.java:748) 
> {panel}
>  
> root cause:
> BlockPoolSliceStorage instance is shared for all storage locations recover 
> transition. In BlockPoolSliceStorage.doTransition, it will read the old 
> layoutVersion from local storage, compare with current DataNode version, then 
> do upgrade. In doUpgrade, add the transition work as a sub-thread, the 
> transition work will set 

[jira] [Commented] (HDFS-14311) multi-threading conflict at layoutVersion when loading block pool storage

2019-08-13 Thread Stephen O'Donnell (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906285#comment-16906285
 ] 

Stephen O'Donnell commented on HDFS-14311:
--

I have tried to reproduce this in a unit test, but without success.

The issue is a little more subtle than I first suspected too. In the 
doTransition method, it reads the layout version of the storage it is working 
from and stores that in the blockPoolSliceStorage instance variable. Then it 
submits a job to upgrade the storage. That upgrade job will change the same 
instance variable to the new layout version, but at the same time the next 
storage is having its layout version read into the same instance variable and 
this instance variable will flip-flop between the values.

[~caiyicong] are you able to reproduce this problem easily or do you see it 
frequently? It would be nice to be able to reproduce it via a unit or manual 
test.

> multi-threading conflict at layoutVersion when loading block pool storage
> -
>
> Key: HDFS-14311
> URL: https://issues.apache.org/jira/browse/HDFS-14311
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rolling upgrades
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Major
> Attachments: HDFS-14311.1.patch
>
>
> When DataNode upgrade from 2.7.3 to 2.9.2, there is a conflict at 
> StorageInfo.layoutVersion in loading block pool storage process.
> It will cause this exception:
>  
> {panel:title=exceptions}
> 2019-02-15 10:18:01,357 [13783] - INFO [Thread-33:BlockPoolSliceStorage@395] 
> - Restored 36974 block files from trash before the layout upgrade. These 
> blocks will be moved to the previous directory during the upgrade
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:BlockPoolSliceStorage@226] 
> - Failed to analyze storage directories for block pool 
> BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the 
> namespace state: LV = -63 CTime = 0
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:DataStorage@472] - Failed 
> to add storage directory [DISK]file:/mnt/dfs/2/hadoop/hdfs/data/ for block 
> pool BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the 
> namespace state: LV = -63 CTime = 0
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
>  at 
> 

[jira] [Commented] (HDFS-14311) multi-threading conflict at layoutVersion when loading block pool storage

2019-08-09 Thread Stephen O'Donnell (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904047#comment-16904047
 ] 

Stephen O'Donnell commented on HDFS-14311:
--

Thanks for the patch [~caiyicong], this is a good discovery. I suspect the 
reason this has not come up before, is because it likely only happens when the 
Datanode volumes have a very small number of blocks.

The current code path iterates over each storage directory, and if it needs 
upgraded, it will return a callable which is submitted to an executor, and then 
the next directory is checked.

Inside the callable, it will first upgrade the storage before updating the 
BlockPoolSliceStorage instance variables. If the storage upgrade happens very 
quickly, then the first callable will change the instance variables in 
BlockPoolSliceStorage, and the later storage directories will get the error you 
mentiond. If the upgrade of the storage takes more time than it takes to create 
all the callables, which is likely if there many blocks on the storage, then 
this issue would not manifest.

If I understand correctly, your patch works around the problem by creating and 
collecting all the 'upgrade callables' and then submitting them to the executor 
only after all of them have been created. That way, it does not matter when the 
BlockPoolSliceStorage variables are updated.

With the current structure of the code, and how the layout version and ctime 
are used within BlockPoolSliceStorage, I think your patch is the best way of 
fixing this. Anything else would require a lot more refactoring.

I have just a few comments:
 # I don't believe any of the test failures are related to this change.
 # Could you address the checkstyle issues highlighted in the last run please?
 # I wonder if we could think of a way to add a test for this, to at least 
reproduce the issue. It could be tricky due to the timing of things, but if we 
create a single DN with quite a few storage directories at an older layout 
version, and then upgraded them, it may be possible.

 

> multi-threading conflict at layoutVersion when loading block pool storage
> -
>
> Key: HDFS-14311
> URL: https://issues.apache.org/jira/browse/HDFS-14311
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rolling upgrades
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Yicong Cai
>Priority: Major
> Attachments: HDFS-14311.1.patch
>
>
> When DataNode upgrade from 2.7.3 to 2.9.2, there is a conflict at 
> StorageInfo.layoutVersion in loading block pool storage process.
> It will cause this exception:
>  
> {panel:title=exceptions}
> 2019-02-15 10:18:01,357 [13783] - INFO [Thread-33:BlockPoolSliceStorage@395] 
> - Restored 36974 block files from trash before the layout upgrade. These 
> blocks will be moved to the previous directory during the upgrade
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:BlockPoolSliceStorage@226] 
> - Failed to analyze storage directories for block pool 
> BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the 
> namespace state: LV = -63 CTime = 0
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:DataStorage@472] - Failed 
> to add storage directory [DISK]file:/mnt/dfs/2/hadoop/hdfs/data/ for block 
> pool BP-1216718839-10.120.232.23-1548736842023
> java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer 

[jira] [Commented] (HDFS-14311) multi-threading conflict at layoutVersion when loading block pool storage

2019-05-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834516#comment-16834516
 ] 

Hadoop QA commented on HDFS-14311:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 41s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 3 new + 35 unchanged - 0 fixed = 38 total (was 35) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 42s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}146m 17s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.server.datanode.TestBPOfferService |
|   | hadoop.hdfs.TestMaintenanceState |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | HDFS-14311 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12968015/HDFS-14311.1.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f46a12fec9c7 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 
13 15:00:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 49e1292 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/26761/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt