[jira] [Comment Edited] (HDFS-10921) TestDiskspaceQuotaUpdate doesn't wait for NN to get out of safe mode

2016-10-13 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572550#comment-15572550
 ] 

Erik Krogen edited comment on HDFS-10921 at 10/13/16 5:09 PM:
--

Should we still change {{restartNameNodes}} to {{restartNameNode(true)}}? 
Correctness shouldn't be impacted either way because of the {{@Before}} block 
but it may help to reduce the number of times that the {{@Before}} block's 
condition hit. I'm assuming that doing the extra call to {{waitClusterUp}} as a 
result (inside of {{restartNameNode}}) will be faster than starting an entirely 
new cluster. 
LGTM either way though, just a minor detail. 


was (Author: xkrogen):
Should we still change {{restartNameNodes}} to {{restartNameNode(true)}}? 
Correctness shouldn't be impacted either way because of the {{@Before}} block 
but it may help to reduce the number of times that the {{@Before}} block's 
condition hit. I'm assuming that doing the extra call to {{waitClusterUp}} as a 
result (inside of {{restartNameNode}}) will be faster than starting an entirely 
new cluster. 

> TestDiskspaceQuotaUpdate doesn't wait for NN to get out of safe mode
> 
>
> Key: HDFS-10921
> URL: https://issues.apache.org/jira/browse/HDFS-10921
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10921.001.patch, HDFS-10921.002.patch, 
> HDFS-10921.003.patch
>
>
> Test fails intermittently because the NN is still in safe mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10921) TestDiskspaceQuotaUpdate doesn't wait for NN to get out of safe mode

2016-10-06 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552636#comment-15552636
 ] 

Erik Krogen edited comment on HDFS-10921 at 10/6/16 5:50 PM:
-

[~shahrs87], I tried that previously, but when running the entire Test file in 
one run with that modification, the first two tests would pass and subsequent 
ones would fail... I was thinking it may have something to do with calling 
{{setImageLoaded}} causing some sort of inconsistent state in the Namesystem? 
The error I get is:
{code}
java.lang.IllegalStateException: Cannot skip to less than the current value 
(=1073741826), where newValue=1073741825
at 
org.apache.hadoop.util.SequentialNumber.skipTo(SequentialNumber.java:58)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockIdManager.setLastAllocatedContiguousBlockId(BlockIdManager.java:112)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:806)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:240)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:149)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:838)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:695)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:291)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1012)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:666)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:650)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:712)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:928)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:907)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1624)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNode(MiniDFSCluster.java:2073)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.restartNameNodes(MiniDFSCluster.java:2028)
at 
org.apache.hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate.testTruncateOverQuota(TestDiskspaceQuotaUpdate.java:347)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}
So it seems to me that it creates an inconsistency between the edit log and the 
image?


was (Author: xkrogen):
[~shahrs87], I tried that previously, but when running the entire Test file in 
one run with that modification, the first two tests would pass and subsequent 
ones would fail... I was thinking it may have something to do with calling 
{{setImageLoaded}} causing some sort of inconsistent state in the Namesystem?

> TestDiskspaceQuotaUpdate doesn't wait for NN to get out of safe mode
> 
>
> Key: HDFS-10921
> URL: https://issues.apache.org/jira/browse/HDFS-10921
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10921.001.patch, HDFS-10921.002.patch
>
>
> Test fails intermittently because the NN is still in safe mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10921) TestDiskspaceQuotaUpdate doesn't wait for NN to get out of safe mode

2016-09-29 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15533839#comment-15533839
 ] 

Rushabh S Shah edited comment on HDFS-10921 at 9/29/16 7:47 PM:


bq. the concern about namespace pollution is valid, but this is why the first 
line of each of the tests is to create a Path that is unique to the given test 
case and all subsequent operations occur under that Path. 
I agree that the issue mentioned in this jira is not due to namespace pollution.
But this can cause test failures in future.
Lets take for example: 
{{TestDiskspaceQuotaUpdate#testIncreaseReplicationBeforeCommitting}} and 
{{TestDiskspaceQuotaUpdate#testDecreaseReplicationBeforeCommitting}}
Both of this test case calls {{testQuotaIssuesBeforeCommitting(short 
initialReplication,short finalReplication)}} to create the file.
Here is the audit log for create call for file creation from 
testIncreaseReplicationBeforeCommitting:
{noformat}
2016-09-29 11:19:02,069 [IPC Server handler 2 on 58161] INFO  
FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7090)) - allowed=true 
  ugi=rushabhs (auth:SIMPLE)  ip=/127.0.0.1   cmd=create  
src=/TestQuotaUpdate/testQuotaIssuesBeforeCommitting/1-4/testfile   
dst=nullperm=rushabhs:supergroup:rw-r--r--  proto=rpc
{noformat}

Here is the audit log for create call for file creation from 
testDecreaseReplicationBeforeCommitting:
{noformat}
2016-09-29 11:20:20,403 [IPC Server handler 3 on 58161] INFO  
FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7090)) - allowed=true 
  ugi=rushabhs (auth:SIMPLE)  ip=/127.0.0.1   cmd=create  
src=/TestQuotaUpdate/testQuotaIssuesBeforeCommitting/4-1/testfile   
dst=nullperm=rushabhs:supergroup:rw-r--r--  proto=rpc
{noformat}
Only difference between 2 file paths is the 
{{initialReplication-finalReplication}} value.
And we can't expect every developers in future to do the right thing while 
creating the file.
I can point out at least 100's of file creations in other test suites that 
don't create a path that is unique to test case.
That's why I think we should clear the namesystem state before running new test 
case.


was (Author: shahrs87):
bq. the concern about namespace pollution is valid, but this is why the first 
line of each of the tests is to create a Path that is unique to the given test 
case and all subsequent operations occur under that Path. 
I agree that the issue mentioned in this jira is not due to namespace pollution.
But this can cause test failures in future.
Lets take for example: 
{{TestDiskspaceQuotaUpdate#testIncreaseReplicationBeforeCommitting}} and 
{{TestDiskspaceQuotaUpdate#testDecreaseReplicationBeforeCommitting}}
Both of this test case calls {{testQuotaIssuesBeforeCommitting(short 
initialReplication,short finalReplication)}} to create the file.
Here is the audit log for create call for file creation from 
testIncreaseReplicationBeforeCommitting:
{noformat}
2016-09-29 11:19:02,069 [IPC Server handler 2 on 58161] INFO  
FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7090)) - allowed=true 
  ugi=rushabhs (auth:SIMPLE)  ip=/127.0.0.1   cmd=create  
src=/TestQuotaUpdate/testQuotaIssuesBeforeCommitting/1-4/testfile   
dst=nullperm=rushabhs:supergroup:rw-r--r--  proto=rpc
{noformat}

Here is the audit log for create call for file creation from 
testDecreaseReplicationBeforeCommitting:
{noformat}
2016-09-29 11:20:20,403 [IPC Server handler 3 on 58161] INFO  
FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7090)) - allowed=true 
  ugi=rushabhs (auth:SIMPLE)  ip=/127.0.0.1   cmd=create  
src=/TestQuotaUpdate/testQuotaIssuesBeforeCommitting/4-1/testfile   
dst=nullperm=rushabhs:supergroup:rw-r--r--  proto=rpc
{noformat}
Only difference between 2 file creations is the 
{{initialReplication-finalReplication}} value.
And we can't expect every developers in future to do the right thing while 
creating the file.
I can point out at least 100's of file creations in other test suites that 
don't create a path that is unique to test case.
That's why I think we should clear the namesystem state before running new test 
case.

> TestDiskspaceQuotaUpdate doesn't wait for NN to get out of safe mode
> 
>
> Key: HDFS-10921
> URL: https://issues.apache.org/jira/browse/HDFS-10921
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10921.001.patch, HDFS-10921.002.patch
>
>
> Test fails intermittently because the NN is still in safe mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: 

[jira] [Comment Edited] (HDFS-10921) TestDiskspaceQuotaUpdate doesn't wait for NN to get out of safe mode

2016-09-29 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15533396#comment-15533396
 ] 

Erik Krogen edited comment on HDFS-10921 at 9/29/16 5:22 PM:
-

[~shahrs87], the concern about namespace pollution is valid, but this is why 
the first line of each of the tests is to create a Path that is unique to the 
given test case and all subsequent operations occur under that Path.  
Especially given the error that Eric has posted, I don't think that is the 
issue. 

The reason that {{cluster.restartNameNodes()}} is called, IIUC, is to initiate 
a check of the edit log to ensure that edits aren't corrupted as per the 
comment right above the call - previously the cluster was completely recreated 
after each test case so there would be no reason (in terms of cleaning) to 
restart the namenode at the end of the test. Pinging [~jingzhao] to confirm.

[~ebadger], first off, good catch. Apologies for introducing this issue and 
thank you for helping to deal with it. Can we just change the calls to 
{{cluster.restartNameNodes()}} to {{cluster.restartNameNode(true)}}? There is 
only one NN in this test so we can just use the already-available method on 
{{MiniDFSCluster}} and keep the change local to the test itself. 


was (Author: xkrogen):
[~shahrs87], the concern about namespace pollution is valid, but this is why 
the first line of each of the tests is to create a Path that is unique to the 
given test case and all subsequent operations occur under that Path.  
Especially given the error that Eric has posted, I don't think that is the 
issue. 

The reason that {{cluster.restartNameNodes()}} is called, IIUC, is to initiate 
a check of the edit log to ensure that edits aren't corrupted as per the 
comment right above the call - previously the cluster was completely recreated 
after each test case so there would be no reason (in terms of cleaning) to 
restart the namenode at the end of the test. Pinging [~jingzhao] to confirm.

[~ebadger], first off, good catch. Apologies for introducing this bug. Can we 
just change the calls to {{cluster.restartNameNodes()}} to 
{{cluster.restartNameNode(true)}}? There is only one NN in this test so we can 
just use the already-available method on {{MiniDFSCluster}} and keep the change 
local to the test itself. 

> TestDiskspaceQuotaUpdate doesn't wait for NN to get out of safe mode
> 
>
> Key: HDFS-10921
> URL: https://issues.apache.org/jira/browse/HDFS-10921
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10921.001.patch, HDFS-10921.002.patch
>
>
> Test fails intermittently because the NN is still in safe mode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org