[ 
https://issues.apache.org/jira/browse/HDFS-11044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11044:
-----------------------------
    Description: 
The test {{TestRollingUpgrade#testRollback}} fails intermittently in 
trunk(https://builds.apache.org/job/PreCommit-HDFS-Build/17250/testReport/). 
The stack info:
{code}
java.lang.AssertionError: Test resulted in an unexpected exit
        at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1949)
        at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1936)
        at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1929)
        at 
org.apache.hadoop.hdfs.TestRollingUpgrade.testRollback(TestRollingUpgrade.java:351)
{code}
I looked into that, it seems  there is some IOException happenning in writing 
files to nn storages(Can see jenkins report). And then this exception will be 
remenbered in {{ExitUtil.firstExitException}}. Finally when we do the cluster's 
shutdown operations, this exception will be threw.

The exception info:
{code}
2016-10-21 12:54:02,300 [main] FATAL hdfs.MiniDFSCluster 
(MiniDFSCluster.java:shutdown(1946)) - Test resulted in an unexpected exit
org.apache.hadoop.util.ExitUtil$ExitException: java.io.IOException: All the 
storage failed while writing properties to VERSION file
        at 
org.apache.hadoop.hdfs.server.namenode.NNStorage.writeAll(NNStorage.java:1151)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.updateStorageVersion(FSImage.java:999)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:850)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:240)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:149)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:838)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:819)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
{code}

The IOException is beacause that all the sotrage dir have be removed. IMO, one 
of the reason is that when we  writing some properties or write transactionId 
to storage failed that lead the existing sotrage to be removed.

In test {{TestRollingUpgrade#testRollback}} it will do many times for 
restarting namenode operations, the underlying IO exceptions will be happened. 
So I'm not sure if it's normal here. But one way that I am sure to fix this: We 
can use {{checkExitOnShutdown(false)}} to skip the ExitException check. And 
this have been done in {{TestRollingUpgrade#testRollingUpgradeWithQJM}}. In 
addition, since that the shutdown operation is the last operation in the test, 
it will not influence the current logic.



  was:
The test {{TestRollingUpgrade#testRollback}} fails intermittently in 
trunk(https://builds.apache.org/job/PreCommit-HDFS-Build/17250/testReport/). 
The stack info:
{code}
java.lang.AssertionError: Test resulted in an unexpected exit
        at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1949)
        at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1936)
        at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1929)
        at 
org.apache.hadoop.hdfs.TestRollingUpgrade.testRollback(TestRollingUpgrade.java:351)
{code}
I looked into that, it seems  there is some IOException happenning in writing 
files to nn storages(Can see jenkins report). And then this exception will be 
remenbered in {{ExitUtil.firstExitException}}. Finally when we do the cluster's 
shutdown operations, this exception will be threw.

The exception info:
{code}
2016-10-21 12:54:02,300 [main] FATAL hdfs.MiniDFSCluster 
(MiniDFSCluster.java:shutdown(1946)) - Test resulted in an unexpected exit
org.apache.hadoop.util.ExitUtil$ExitException: java.io.IOException: All the 
storage failed while writing properties to VERSION file
        at 
org.apache.hadoop.hdfs.server.namenode.NNStorage.writeAll(NNStorage.java:1151)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.updateStorageVersion(FSImage.java:999)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:850)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:240)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:149)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:838)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:819)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
{code}

The IOException is beacause that all the sotrage dir have be removed. IMO, one 
of the reason is that when we  writing some properties or write transactionId 
to storage failed that lead the existing sotrage to be removed.

In test {{TestRollingUpgrade#testRollback}} it will do many times for 
restarting namenode operations, the underlying IO exceptions will be happened. 
So I'm not sure if it's normal here. But one way that I am sure to fix this: We 
can use {{checkExitOnShutdown(false)}} to skip the ExitException check. And 
this have been done in {{TestRollingUpgrade#testRollingUpgradeWithQJM}}.




> TestRollingUpgrade fails intermittently
> ---------------------------------------
>
>                 Key: HDFS-11044
>                 URL: https://issues.apache.org/jira/browse/HDFS-11044
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Yiqun Lin
>            Assignee: Yiqun Lin
>
> The test {{TestRollingUpgrade#testRollback}} fails intermittently in 
> trunk(https://builds.apache.org/job/PreCommit-HDFS-Build/17250/testReport/). 
> The stack info:
> {code}
> java.lang.AssertionError: Test resulted in an unexpected exit
>       at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1949)
>       at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1936)
>       at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1929)
>       at 
> org.apache.hadoop.hdfs.TestRollingUpgrade.testRollback(TestRollingUpgrade.java:351)
> {code}
> I looked into that, it seems  there is some IOException happenning in writing 
> files to nn storages(Can see jenkins report). And then this exception will be 
> remenbered in {{ExitUtil.firstExitException}}. Finally when we do the 
> cluster's shutdown operations, this exception will be threw.
> The exception info:
> {code}
> 2016-10-21 12:54:02,300 [main] FATAL hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1946)) - Test resulted in an unexpected exit
> org.apache.hadoop.util.ExitUtil$ExitException: java.io.IOException: All the 
> storage failed while writing properties to VERSION file
>       at 
> org.apache.hadoop.hdfs.server.namenode.NNStorage.writeAll(NNStorage.java:1151)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.updateStorageVersion(FSImage.java:999)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:850)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:240)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:149)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:838)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:819)
>       at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
>       at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
> {code}
> The IOException is beacause that all the sotrage dir have be removed. IMO, 
> one of the reason is that when we  writing some properties or write 
> transactionId to storage failed that lead the existing sotrage to be removed.
> In test {{TestRollingUpgrade#testRollback}} it will do many times for 
> restarting namenode operations, the underlying IO exceptions will be 
> happened. So I'm not sure if it's normal here. But one way that I am sure to 
> fix this: We can use {{checkExitOnShutdown(false)}} to skip the ExitException 
> check. And this have been done in 
> {{TestRollingUpgrade#testRollingUpgradeWithQJM}}. In addition, since that the 
> shutdown operation is the last operation in the test, it will not influence 
> the current logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to