[jira] [Comment Edited] (HDFS-10921) TestDiskspaceQuotaUpdate doesn't wait for NN to get out of safe mode
[ https://issues.apache.org/jira/browse/HDFS-10921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572550#comment-15572550 ] Erik Krogen edited comment on HDFS-10921 at 10/13/16 5:09 PM: -- Should we still change {{restartNameNodes}} to {{restartNameNode(true)}}? Correctness shouldn't be impacted either way because of the {{@Before}} block but it may help to reduce the number of times that the {{@Before}} block's condition hit. I'm assuming that doing the extra call to {{waitClusterUp}} as a result (inside of {{restartNameNode}}) will be faster than starting an entirely new cluster. LGTM either way though, just a minor detail. was (Author: xkrogen): Should we still change {{restartNameNodes}} to {{restartNameNode(true)}}? Correctness shouldn't be impacted either way because of the {{@Before}} block but it may help to reduce the number of times that the {{@Before}} block's condition hit. I'm assuming that doing the extra call to {{waitClusterUp}} as a result (inside of {{restartNameNode}}) will be faster than starting an entirely new cluster. > TestDiskspaceQuotaUpdate doesn't wait for NN to get out of safe mode > > > Key: HDFS-10921 > URL: https://issues.apache.org/jira/browse/HDFS-10921 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10921.001.patch, HDFS-10921.002.patch, > HDFS-10921.003.patch > > > Test fails intermittently because the NN is still in safe mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10921) TestDiskspaceQuotaUpdate doesn't wait for NN to get out of safe mode
[ https://issues.apache.org/jira/browse/HDFS-10921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552636#comment-15552636 ] Erik Krogen edited comment on HDFS-10921 at 10/6/16 5:50 PM: - [~shahrs87], I tried that previously, but when running the entire Test file in one run with that modification, the first two tests would pass and subsequent ones would fail... I was thinking it may have something to do with calling {{setImageLoaded}} causing some sort of inconsistent state in the Namesystem? The error I get is: {code} java.lang.IllegalStateException: Cannot skip to less than the current value (=1073741826), where newValue=1073741825 at org.apache.hadoop.util.SequentialNumber.skipTo(SequentialNumber.java:58) at org.apache.hadoop.hdfs.server.blockmanagement.BlockIdManager.setLastAllocatedContiguousBlockId(BlockIdManager.java:112) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:806) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:240) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:149) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:838) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:695) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:291) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1012) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:666) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:650) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:712) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:928) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:907) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1624) at org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2073) at org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNodes(MiniDFSCluster.java:2028) at org.apache.hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate.testTruncateOverQuota(TestDiskspaceQuotaUpdate.java:347) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} So it seems to me that it creates an inconsistency between the edit log and the image? was (Author: xkrogen): [~shahrs87], I tried that previously, but when running the entire Test file in one run with that modification, the first two tests would pass and subsequent ones would fail... I was thinking it may have something to do with calling {{setImageLoaded}} causing some sort of inconsistent state in the Namesystem? > TestDiskspaceQuotaUpdate doesn't wait for NN to get out of safe mode > > > Key: HDFS-10921 > URL: https://issues.apache.org/jira/browse/HDFS-10921 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10921.001.patch, HDFS-10921.002.patch > > > Test fails intermittently because the NN is still in safe mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10921) TestDiskspaceQuotaUpdate doesn't wait for NN to get out of safe mode
[ https://issues.apache.org/jira/browse/HDFS-10921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15533839#comment-15533839 ] Rushabh S Shah edited comment on HDFS-10921 at 9/29/16 7:47 PM: bq. the concern about namespace pollution is valid, but this is why the first line of each of the tests is to create a Path that is unique to the given test case and all subsequent operations occur under that Path. I agree that the issue mentioned in this jira is not due to namespace pollution. But this can cause test failures in future. Lets take for example: {{TestDiskspaceQuotaUpdate#testIncreaseReplicationBeforeCommitting}} and {{TestDiskspaceQuotaUpdate#testDecreaseReplicationBeforeCommitting}} Both of this test case calls {{testQuotaIssuesBeforeCommitting(short initialReplication,short finalReplication)}} to create the file. Here is the audit log for create call for file creation from testIncreaseReplicationBeforeCommitting: {noformat} 2016-09-29 11:19:02,069 [IPC Server handler 2 on 58161] INFO FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7090)) - allowed=true ugi=rushabhs (auth:SIMPLE) ip=/127.0.0.1 cmd=create src=/TestQuotaUpdate/testQuotaIssuesBeforeCommitting/1-4/testfile dst=nullperm=rushabhs:supergroup:rw-r--r-- proto=rpc {noformat} Here is the audit log for create call for file creation from testDecreaseReplicationBeforeCommitting: {noformat} 2016-09-29 11:20:20,403 [IPC Server handler 3 on 58161] INFO FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7090)) - allowed=true ugi=rushabhs (auth:SIMPLE) ip=/127.0.0.1 cmd=create src=/TestQuotaUpdate/testQuotaIssuesBeforeCommitting/4-1/testfile dst=nullperm=rushabhs:supergroup:rw-r--r-- proto=rpc {noformat} Only difference between 2 file paths is the {{initialReplication-finalReplication}} value. And we can't expect every developers in future to do the right thing while creating the file. I can point out at least 100's of file creations in other test suites that don't create a path that is unique to test case. That's why I think we should clear the namesystem state before running new test case. was (Author: shahrs87): bq. the concern about namespace pollution is valid, but this is why the first line of each of the tests is to create a Path that is unique to the given test case and all subsequent operations occur under that Path. I agree that the issue mentioned in this jira is not due to namespace pollution. But this can cause test failures in future. Lets take for example: {{TestDiskspaceQuotaUpdate#testIncreaseReplicationBeforeCommitting}} and {{TestDiskspaceQuotaUpdate#testDecreaseReplicationBeforeCommitting}} Both of this test case calls {{testQuotaIssuesBeforeCommitting(short initialReplication,short finalReplication)}} to create the file. Here is the audit log for create call for file creation from testIncreaseReplicationBeforeCommitting: {noformat} 2016-09-29 11:19:02,069 [IPC Server handler 2 on 58161] INFO FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7090)) - allowed=true ugi=rushabhs (auth:SIMPLE) ip=/127.0.0.1 cmd=create src=/TestQuotaUpdate/testQuotaIssuesBeforeCommitting/1-4/testfile dst=nullperm=rushabhs:supergroup:rw-r--r-- proto=rpc {noformat} Here is the audit log for create call for file creation from testDecreaseReplicationBeforeCommitting: {noformat} 2016-09-29 11:20:20,403 [IPC Server handler 3 on 58161] INFO FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7090)) - allowed=true ugi=rushabhs (auth:SIMPLE) ip=/127.0.0.1 cmd=create src=/TestQuotaUpdate/testQuotaIssuesBeforeCommitting/4-1/testfile dst=nullperm=rushabhs:supergroup:rw-r--r-- proto=rpc {noformat} Only difference between 2 file creations is the {{initialReplication-finalReplication}} value. And we can't expect every developers in future to do the right thing while creating the file. I can point out at least 100's of file creations in other test suites that don't create a path that is unique to test case. That's why I think we should clear the namesystem state before running new test case. > TestDiskspaceQuotaUpdate doesn't wait for NN to get out of safe mode > > > Key: HDFS-10921 > URL: https://issues.apache.org/jira/browse/HDFS-10921 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10921.001.patch, HDFS-10921.002.patch > > > Test fails intermittently because the NN is still in safe mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail:
[jira] [Comment Edited] (HDFS-10921) TestDiskspaceQuotaUpdate doesn't wait for NN to get out of safe mode
[ https://issues.apache.org/jira/browse/HDFS-10921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15533396#comment-15533396 ] Erik Krogen edited comment on HDFS-10921 at 9/29/16 5:22 PM: - [~shahrs87], the concern about namespace pollution is valid, but this is why the first line of each of the tests is to create a Path that is unique to the given test case and all subsequent operations occur under that Path. Especially given the error that Eric has posted, I don't think that is the issue. The reason that {{cluster.restartNameNodes()}} is called, IIUC, is to initiate a check of the edit log to ensure that edits aren't corrupted as per the comment right above the call - previously the cluster was completely recreated after each test case so there would be no reason (in terms of cleaning) to restart the namenode at the end of the test. Pinging [~jingzhao] to confirm. [~ebadger], first off, good catch. Apologies for introducing this issue and thank you for helping to deal with it. Can we just change the calls to {{cluster.restartNameNodes()}} to {{cluster.restartNameNode(true)}}? There is only one NN in this test so we can just use the already-available method on {{MiniDFSCluster}} and keep the change local to the test itself. was (Author: xkrogen): [~shahrs87], the concern about namespace pollution is valid, but this is why the first line of each of the tests is to create a Path that is unique to the given test case and all subsequent operations occur under that Path. Especially given the error that Eric has posted, I don't think that is the issue. The reason that {{cluster.restartNameNodes()}} is called, IIUC, is to initiate a check of the edit log to ensure that edits aren't corrupted as per the comment right above the call - previously the cluster was completely recreated after each test case so there would be no reason (in terms of cleaning) to restart the namenode at the end of the test. Pinging [~jingzhao] to confirm. [~ebadger], first off, good catch. Apologies for introducing this bug. Can we just change the calls to {{cluster.restartNameNodes()}} to {{cluster.restartNameNode(true)}}? There is only one NN in this test so we can just use the already-available method on {{MiniDFSCluster}} and keep the change local to the test itself. > TestDiskspaceQuotaUpdate doesn't wait for NN to get out of safe mode > > > Key: HDFS-10921 > URL: https://issues.apache.org/jira/browse/HDFS-10921 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10921.001.patch, HDFS-10921.002.patch > > > Test fails intermittently because the NN is still in safe mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org