[jira] [Created] (HDDS-755) ContainerInfo and ContainerReplica protobuf changes

2018-10-29 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-755:


 Summary: ContainerInfo and ContainerReplica protobuf changes
 Key: HDDS-755
 URL: https://issues.apache.org/jira/browse/HDDS-755
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode, SCM
Reporter: Nanda kumar
Assignee: Nanda kumar


We have different classes that maintain container related information, we can 
consolidate them so that it is easy to read the code.

Proposal:
In SCM: will be used in communication between SCM and Client, also used for 
storing in db
* ContainerInfoProto
* ContainerInfo
 
In Datanode: Used in communication between Datanode and SCM
* ContainerReplicaProto
* ContainerReplica
 
In Datanode: Used in communication between Datanode and Client
* ContainerDataProto
* ContainerData




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-801) Quasi close the container when close is not executed via Ratis

2018-11-04 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-801:


 Summary: Quasi close the container when close is not executed via 
Ratis
 Key: HDDS-801
 URL: https://issues.apache.org/jira/browse/HDDS-801
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode
Affects Versions: 0.3.0
Reporter: Nanda kumar
Assignee: Nanda kumar


When datanode received CloseContainerCommand and the replication type is not 
RATIS, we should QUASI close the container. After quasi-closing the container 
an ICR has to be sent to SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-812) TestEndPoint#testCheckVersionResponse is failing

2018-11-06 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-812:


 Summary: TestEndPoint#testCheckVersionResponse is failing
 Key: HDDS-812
 URL: https://issues.apache.org/jira/browse/HDDS-812
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Nanda kumar


 TestEndPoint#testCheckVersionResponse is failing with the below error
{code:java}
[ERROR] 
testCheckVersionResponse(org.apache.hadoop.ozone.container.common.TestEndPoint) 
 Time elapsed: 0.142 s  <<< FAILURE!
java.lang.AssertionError: expected: but was:
{code}

Once we are in REGISTER state we don't allow getVersion call anymore. This is 
causing the test case to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-823) OzoneRestClient is failing with NPE on getKeyDetails call

2018-11-08 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-823:


 Summary: OzoneRestClient is failing with NPE on getKeyDetails call
 Key: HDDS-823
 URL: https://issues.apache.org/jira/browse/HDDS-823
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Affects Versions: 0.3.0
Reporter: Nanda kumar


{{RestClient#getKeyDetails}} is failing with {{NullPointerException}} which is 
causing a lot of unit test and smoke test to fail.
Exception trace:
{code:java}
Error while calling command 
(org.apache.hadoop.ozone.web.ozShell.keys.InfoKeyHandler@13713486): 
java.lang.NullPointerException
at picocli.CommandLine.execute(CommandLine.java:926)
at picocli.CommandLine.access$700(CommandLine.java:104)
at picocli.CommandLine$RunLast.handle(CommandLine.java:1083)
at picocli.CommandLine$RunLast.handle(CommandLine.java:1051)
at 
picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959)
at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242)
at 
org.apache.hadoop.ozone.ozShell.TestOzoneShell.execute(TestOzoneShell.java:259)
at 
org.apache.hadoop.ozone.ozShell.TestOzoneShell.testInfoDirKey(TestOzoneShell.java:1013)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.ozone.client.rest.RestClient.getKeyDetails(RestClient.java:817)
at 
org.apache.hadoop.ozone.client.OzoneBucket.getKey(OzoneBucket.java:282)
at 
org.apache.hadoop.ozone.web.ozShell.keys.InfoKeyHandler.call(InfoKeyHandler.java:65)
at 
org.apache.hadoop.ozone.web.ozShell.keys.InfoKeyHandler.call(InfoKeyHandler.java:37)
at picocli.CommandLine.execute(CommandLine.java:919)
... 18 more
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-827) TestStorageContainerManagerHttpServer should use dynamic port

2018-11-11 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-827:


 Summary: TestStorageContainerManagerHttpServer should use dynamic 
port
 Key: HDDS-827
 URL: https://issues.apache.org/jira/browse/HDDS-827
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: test
Reporter: Nanda kumar


Most of the time {{TestStorageContainerManagerHttpServer}} is failing with 
{code}
java.net.BindException: Port in use: 0.0.0.0:9876
...
Caused by: java.net.BindException: Address already in use
{code}

TestStorageContainerManagerHttpServer should use a port which is free 
(dynamic), instead of trying to bind with default 9876.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-830) Datanode should not start XceiverServerRatis before getting version information from SCM

2018-11-12 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-830:


 Summary: Datanode should not start XceiverServerRatis before 
getting version information from SCM
 Key: HDDS-830
 URL: https://issues.apache.org/jira/browse/HDDS-830
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode
Affects Versions: 0.3.0
Reporter: Nanda kumar


If a datanode restarts quickly before SCM detects, it will rejoin the ratis 
ring (existing pipeline). Since SCM didn't detect this restart, the pipeline is 
not closed. Now there is a time gap after the datanode is started and it got 
the version information from SCM. During this time, the SCM ID in datanode is 
not set(null). If a client tries to use this pipeline during that time, the 
container state machine will throw {{java.lang.NullPointerException: scmId 
cannot be nul}}. This will cause {{RaftLogWorker}} to terminate resulting in 
datanode crash.

{code}
2018-11-12 19:45:31,811 ERROR storage.RaftLogWorker 
(ExitUtils.java:terminate(86)) - Terminating with exit status 1: 
407fd181-2ff7-4651-9a47-a0927ede4c51-RaftLogWorker failed.
java.io.IOException: java.lang.NullPointerException: scmId cannot be null
  at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54)
  at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61)
  at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:83)
  at 
org.apache.ratis.server.storage.RaftLogWorker$StateMachineDataPolicy.getFromFuture(RaftLogWorker.java:76)
  at 
org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:344)
  at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:216)
  at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException: scmId cannot be null
  at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
  at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:106)
  at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleCreateContainer(KeyValueHandler.java:242)
  at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:165)
  at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.createContainer(HddsDispatcher.java:206)
  at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:124)
  at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:274)
  at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:280)
  at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$1(ContainerStateMachine.java:301)
  at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  ... 1 more
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-831) TestOzoneShell in integration-test is flaky

2018-11-12 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-831:


 Summary: TestOzoneShell in integration-test is flaky
 Key: HDDS-831
 URL: https://issues.apache.org/jira/browse/HDDS-831
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Nanda kumar
Assignee: Nanda kumar


TestOzoneShell in integration-test is flaky, fails in few Jenkins runs.
https://builds.apache.org/job/PreCommit-HDDS-Build/1685/artifact/out/patch-unit-hadoop-ozone_integration-test.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-833) Update javadoc in StorageContainerManager, NodeManager, PipelineManager and ContainerManager

2018-11-13 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-833:


 Summary: Update javadoc in StorageContainerManager, NodeManager, 
PipelineManager and ContainerManager
 Key: HDDS-833
 URL: https://issues.apache.org/jira/browse/HDDS-833
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: SCM
Reporter: Nanda kumar
Assignee: Nanda kumar


The javadoc in following interface/classes has to be updated
* StorageContainerManager
* NodeManager
* NodeStateManager
* PipelineManager
* PipelineStateManager
* ContainerManager
* ContainerStateManager



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-837) Persist originNodeId as part of .container file in datanode

2018-11-14 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-837:


 Summary: Persist originNodeId as part of .container file in 
datanode
 Key: HDDS-837
 URL: https://issues.apache.org/jira/browse/HDDS-837
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode
Reporter: Nanda kumar
Assignee: Nanda kumar


To differentiate the replica of QUASI_CLOSED containers we need 
{{originNodeId}} field. With this field, we can uniquely identify a 
QUASI_CLOSED container replica. This will be needed when we want to CLOSE a 
QUASI_CLOSED container.

This field will be set by the node where the container is created and stored as 
part of {{.container}} file and will be sent as part of ContainerReport to SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-847) TestBlockDeletion is failing

2018-11-16 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-847:


 Summary: TestBlockDeletion is failing
 Key: HDDS-847
 URL: https://issues.apache.org/jira/browse/HDDS-847
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Nanda kumar


{{TestBlockDeletion}} is failing with the below exception
{code}
[ERROR] 
testBlockDeletion(org.apache.hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion)
  Time elapsed: 28.017 s  <<< FAILURE!
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion.testBlockDeletion(TestBlockDeletion.java:165)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-853) Option to force close a container in Datanode

2018-11-19 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-853:


 Summary: Option to force close a container in Datanode
 Key: HDDS-853
 URL: https://issues.apache.org/jira/browse/HDDS-853
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode
Reporter: Nanda kumar
Assignee: Nanda kumar


We need an option to force close a container in Datanode. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-854) TestFailureHandlingByClient.testMultiBlockWritesWithDnFailures is flaky

2018-11-19 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-854:


 Summary: 
TestFailureHandlingByClient.testMultiBlockWritesWithDnFailures is flaky
 Key: HDDS-854
 URL: https://issues.apache.org/jira/browse/HDDS-854
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Nanda kumar
Assignee: Nanda kumar


TestFailureHandlingByClient.testMultiBlockWritesWithDnFailures is flaky. It 
times out while waiting for the mini cluster datanode to restart

{code}
at 
org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:389)
at 
org.apache.hadoop.ozone.MiniOzoneClusterImpl.waitForClusterToBeReady(MiniOzoneClusterImpl.java:122)
at 
org.apache.hadoop.ozone.MiniOzoneClusterImpl.restartHddsDatanode(MiniOzoneClusterImpl.java:276)
at 
org.apache.hadoop.ozone.MiniOzoneClusterImpl.restartHddsDatanode(MiniOzoneClusterImpl.java:283)
at 
org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testMultiBlockWritesWithDnFailures(TestFailureHandlingByClient.java:200)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-863) TestNodeManager is failing

2018-11-21 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-863:


 Summary: TestNodeManager is failing
 Key: HDDS-863
 URL: https://issues.apache.org/jira/browse/HDDS-863
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Nanda kumar


All the tests in TestNodeManager is failing with the below
{code}
[ERROR] 
testScmDetectStaleAndDeadNode(org.apache.hadoop.hdds.scm.node.TestNodeManager)  
Time elapsed: 0.671 s  <<< ERROR!
java.lang.NullPointerException
at 
org.apache.hadoop.hdds.scm.node.SCMNodeManager.updateNodeStat(SCMNodeManager.java:195)
at 
org.apache.hadoop.hdds.scm.node.SCMNodeManager.register(SCMNodeManager.java:276)
at 
org.apache.hadoop.hdds.scm.TestUtils.createRandomDatanodeAndRegister(TestUtils.java:147)
at 
org.apache.hadoop.hdds.scm.node.TestNodeManager.createNodeSet(TestNodeManager.java:590)
at 
org.apache.hadoop.hdds.scm.node.TestNodeManager.testScmDetectStaleAndDeadNode(TestNodeManager.java:316)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:168)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-863) TestNodeManager is failing

2018-11-21 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-863.
--
Resolution: Duplicate

> TestNodeManager is failing
> --
>
> Key: HDDS-863
> URL: https://issues.apache.org/jira/browse/HDDS-863
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Nanda kumar
>Priority: Major
>
> All the tests in TestNodeManager is failing with the below
> {code}
> [ERROR] 
> testScmDetectStaleAndDeadNode(org.apache.hadoop.hdds.scm.node.TestNodeManager)
>   Time elapsed: 0.671 s  <<< ERROR!
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdds.scm.node.SCMNodeManager.updateNodeStat(SCMNodeManager.java:195)
>   at 
> org.apache.hadoop.hdds.scm.node.SCMNodeManager.register(SCMNodeManager.java:276)
>   at 
> org.apache.hadoop.hdds.scm.TestUtils.createRandomDatanodeAndRegister(TestUtils.java:147)
>   at 
> org.apache.hadoop.hdds.scm.node.TestNodeManager.createNodeSet(TestNodeManager.java:590)
>   at 
> org.apache.hadoop.hdds.scm.node.TestNodeManager.testScmDetectStaleAndDeadNode(TestNodeManager.java:316)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:168)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-868) Handle quasi closed container replicas in SCM

2018-11-21 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-868:


 Summary: Handle quasi closed container replicas in SCM
 Key: HDDS-868
 URL: https://issues.apache.org/jira/browse/HDDS-868
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: SCM
Reporter: Nanda kumar
Assignee: Nanda kumar


In case of pipeline failure the containers will be quais closed by datanode. 
SCM has to understand that the container replica is quasi closed and based on 
the block commit sequence Id SCM should identify the latest replica and force 
close them now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-896) Handle over replicated containers in SCM

2018-12-03 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-896:


 Summary: Handle over replicated containers in SCM
 Key: HDDS-896
 URL: https://issues.apache.org/jira/browse/HDDS-896
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: SCM
Reporter: Nanda kumar
Assignee: Nanda kumar


When SCM detects that a container is over-replicated, it has to delete some 
replicas to bring the number of replicas to match the required value. If the 
container is in QUASI_CLOSED state, we should check the {{originNodeId}} field 
while choosing the replica to delete.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-895) Remove command watcher from ReplicationManager

2018-12-03 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-895:


 Summary: Remove command watcher from ReplicationManager
 Key: HDDS-895
 URL: https://issues.apache.org/jira/browse/HDDS-895
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: SCM
Reporter: Nanda kumar
Assignee: Nanda kumar


We can remove the command watcher from {{ReplicationManager}} and user internal 
timeout to retrigger the replication command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-961) Send command execution metrics from Datanode to SCM

2019-01-04 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-961:


 Summary: Send command execution metrics from Datanode to SCM
 Key: HDDS-961
 URL: https://issues.apache.org/jira/browse/HDDS-961
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode, SCM
Reporter: Nanda kumar


The CommandHandlers in datanode calculates and tracks the time taken to execute 
each command that is sent by SCM. It would be nice to report these values to 
SCM so that we can build average time, std dev etc for those operations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-962) Introduce locking for container operations that are executed via DatanodeCommand

2019-01-04 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-962:


 Summary: Introduce locking for container operations that are 
executed via DatanodeCommand
 Key: HDDS-962
 URL: https://issues.apache.org/jira/browse/HDDS-962
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Datanode
Reporter: Nanda kumar
Assignee: Nanda kumar


When SCM decides to take some action on a container, it sends DatanodeCommand 
to the datanodes. These commands are handled by CommandHandlers in datanode. 
Without proper locking, we cannot process these commands in parallel. This jira 
aims to introduce locks on container operations which are performed via 
ContainerController.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1048) Remove SCMNodeStat from SCMNodeManager and use storage information from DatanodeInfo#StorageReportProto

2019-02-05 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1048:
-

 Summary: Remove SCMNodeStat from SCMNodeManager and use storage 
information from DatanodeInfo#StorageReportProto
 Key: HDDS-1048
 URL: https://issues.apache.org/jira/browse/HDDS-1048
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: SCM
Affects Versions: 0.3.0
Reporter: Nanda kumar
Assignee: Nanda kumar


We don't have to maintain SCMNodeStat in SCMNodeManager anymore. This 
information can be obtained from {{DatanodeInfo#StorageReportProto}} inside 
NodeStateMap.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1049) TestRatisPipelineProvider#testCreatePipelineWithFactor is failing

2019-02-05 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1049:
-

 Summary: TestRatisPipelineProvider#testCreatePipelineWithFactor is 
failing
 Key: HDDS-1049
 URL: https://issues.apache.org/jira/browse/HDDS-1049
 Project: Hadoop Distributed Data Store
  Issue Type: Test
  Components: test
Affects Versions: 0.3.0
Reporter: Nanda kumar


{{TestRatisPipelineProvider#testCreatePipelineWithFactor}} is failing with the 
below exception
{code}
[ERROR] 
testCreatePipelineWithFactor(org.apache.hadoop.hdds.scm.pipeline.TestRatisPipelineProvider)
  Time elapsed: 0.927 s  <<< FAILURE!
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.hdds.scm.pipeline.TestRatisPipelineProvider.testCreatePipelineWithFactor(TestRatisPipelineProvider.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1050) TestSCMRestart#testPipelineWithScmRestart is failing

2019-02-05 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1050:
-

 Summary: TestSCMRestart#testPipelineWithScmRestart is failing
 Key: HDDS-1050
 URL: https://issues.apache.org/jira/browse/HDDS-1050
 Project: Hadoop Distributed Data Store
  Issue Type: Test
  Components: test
Affects Versions: 0.3.0
Reporter: Nanda kumar


{{TestSCMRestart#testPipelineWithScmRestart}} is failing with the below 
exception
{code}
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 11.896 
s <<< FAILURE! - in org.apache.hadoop.hdds.scm.pipeline.TestSCMRestart
[ERROR] 
testPipelineWithScmRestart(org.apache.hadoop.hdds.scm.pipeline.TestSCMRestart)  
Time elapsed: 0.047 s  <<< FAILURE!
java.lang.AssertionError: 
expected: but 
was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.hdds.scm.pipeline.TestSCMRestart.testPipelineWithScmRestart(TestSCMRestart.java:110)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1051) TestCloseContainerByPipeline#testIfCloseContainerCommandHandlerIsInvoked is failing

2019-02-05 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1051:
-

 Summary: 
TestCloseContainerByPipeline#testIfCloseContainerCommandHandlerIsInvoked is 
failing
 Key: HDDS-1051
 URL: https://issues.apache.org/jira/browse/HDDS-1051
 Project: Hadoop Distributed Data Store
  Issue Type: Test
  Components: test
Affects Versions: 0.3.0
Reporter: Nanda kumar


{{TestCloseContainerByPipeline#testIfCloseContainerCommandHandlerIsInvoked}} is 
failing with the following exception
{code:java}
[ERROR] 
testIfCloseContainerCommandHandlerIsInvoked(org.apache.hadoop.ozone.container.common.statemachine.commandhandler.TestCloseContainerByPipeline)
  Time elapsed: 21.943 s  <<< ERROR!
java.lang.StackOverflowError
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject$ClassSet.populateSet(Subject.java:1399)
at javax.security.auth.Subject$ClassSet.(Subject.java:1372)
at javax.security.auth.Subject.getPrivateCredentials(Subject.java:767)
at 
org.apache.hadoop.security.UserGroupInformation.getCredentialsInternal(UserGroupInformation.java:1559)
at 
org.apache.hadoop.security.UserGroupInformation.getTokens(UserGroupInformation.java:1524)
at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getEncodedBlockToken(ContainerProtocolCalls.java:580)
at 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.writeChunkAsync(ContainerProtocolCalls.java:318)
at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.writeChunkToContainer(BlockOutputStream.java:602)
at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.writeChunk(BlockOutputStream.java:464)
at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:480)
at 
org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:137)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:489)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501)
..

..
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1052) TestOzoneRpcClient is flaky

2019-02-05 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1052:
-

 Summary: TestOzoneRpcClient is flaky
 Key: HDDS-1052
 URL: https://issues.apache.org/jira/browse/HDDS-1052
 Project: Hadoop Distributed Data Store
  Issue Type: Test
  Components: test
Affects Versions: 0.3.0
Reporter: Nanda kumar


{{TestOzoneRpcClient}} is flaky. The following test cases fail randomly.
{code}
[ERROR] Errors:
[ERROR]   
TestOzoneRpcClient>TestOzoneRpcClientAbstract.testListPartsWithPartMarkerGreaterThanPartCount:1932->TestOzoneRpcClientAbstract.uploadPart:2048
 » IO
[ERROR]   
TestOzoneRpcClient>TestOzoneRpcClientAbstract.testMultipartUploadWithPartsMisMatchWithIncorrectPartName:1657->TestOzoneRpcClientAbstract.uploadPart:2048
 » IO
[ERROR]   TestOzoneRpcClient>TestOzoneRpcClientAbstract.testPutKey:558 » IO 
Unexpected S...
[ERROR]   
TestOzoneRpcClient>TestOzoneRpcClientAbstract.testReadKeyWithCorruptedData:884 
» IO
[ERROR]   
TestOzoneRpcClient>TestOzoneRpcClientAbstract.testUploadPartWithNoOverride:1391 
» IO
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1070) Adding Node and Pipeline related metrics in SCM

2019-02-07 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1070:
-

 Summary: Adding Node and Pipeline related metrics in SCM
 Key: HDDS-1070
 URL: https://issues.apache.org/jira/browse/HDDS-1070
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: SCM
Affects Versions: 0.3.0
Reporter: Nanda kumar
Assignee: Nanda kumar


This jira aims to add more Node and Pipeline related metrics to SCM.
Following metrics will be added as part of this jira:
 * numberOfSuccessfulPipelineCreation
 * numberOfFailedPipelineCreation
 * numberOfSuccessfulPipelineDestroy
 * numberOfFailedPipelineDestroy
 * numberOfPipelineReportProcessed
 * numberOfNodeReportProcessed
 * numberOfHBProcessed
 * number of pipelines in different PipelineState



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1146) Adding container related metrics in SCM

2019-02-20 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1146:
-

 Summary: Adding container related metrics in SCM
 Key: HDDS-1146
 URL: https://issues.apache.org/jira/browse/HDDS-1146
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: SCM
Reporter: Nanda kumar


This jira aims to add more container related metrics to SCM.
Following metrics will be added as part of this jira:

* Number of containers
* Number of open containers
* Number of closed containers
* Number of quasi closed containers
* Number of closing containers
* Number of successful create container calls
* Number of failed create container calls
* Number of successful delete container calls
* Number of failed delete container calls
* Number of successful container report processing
* Number of failed container report processing
* Number of successful incremental container report processing
* Number of failed incremental container report processing




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1166) Fix checkstyle line length issues

2019-02-23 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1166:
-

 Summary: Fix checkstyle line length issues
 Key: HDDS-1166
 URL: https://issues.apache.org/jira/browse/HDDS-1166
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: SCM
Reporter: Nanda kumar
Assignee: Nanda kumar


Checkstyle line length issues have to be fixed in the following classes
* BlockManagerImpl
* CloseContainerCommandHandler
* TestCloseContainerCommandHandler



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1167) Error in hadoop-ozone/dev-support/checks/checkstyle.sh

2019-02-23 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1167:
-

 Summary: Error in hadoop-ozone/dev-support/checks/checkstyle.sh
 Key: HDDS-1167
 URL: https://issues.apache.org/jira/browse/HDDS-1167
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Nanda kumar
Assignee: Nanda kumar


While running {{hadoop-ozone/dev-support/checks/checkstyle.sh}} the following 
error is thrown.
{code}
grep: warning: recursive search of stdin
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1168) Use random ports in TestBlockManager and TestDeletedBlockLog

2019-02-23 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1168:
-

 Summary: Use random ports in TestBlockManager and 
TestDeletedBlockLog
 Key: HDDS-1168
 URL: https://issues.apache.org/jira/browse/HDDS-1168
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: SCM
Reporter: Nanda kumar
Assignee: Nanda kumar


TestBlockManager and TestDeletedBlockLog uses default ports which are causing 
BindException when the tests are executed in parallel. We should start using 
random ports to avoid this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-69) Add checkBucketAccess to OzoneManger

2019-02-28 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-69.
-
Resolution: Won't Fix

> Add checkBucketAccess to OzoneManger
> 
>
> Key: HDDS-69
> URL: https://issues.apache.org/jira/browse/HDDS-69
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>  Components: Ozone Manager
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDFS-12147-HDFS-7240.000.patch, 
> HDFS-12147-HDFS-7240.001.patch
>
>
> Checks if the caller has access to a given bucket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1205) Introduce Replication Manager Thread inside Container Manager

2019-02-28 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1205:
-

 Summary: Introduce Replication Manager Thread inside Container 
Manager
 Key: HDDS-1205
 URL: https://issues.apache.org/jira/browse/HDDS-1205
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: SCM
Reporter: Nanda kumar
Assignee: Nanda kumar


This jira introduces a replication manager thread inside the 
{{ContainerManager}} which will use RMT (Replication Manager Thread) Decision 
Engine to decide the action to be taken on flagged containers.
The containers are flagged for ReplicationManagerThread by 
ContainerReportProcessor(s) and Stale/Dead Node event handlers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1207) Bootstrap flagged container set before starting replication manager thread

2019-03-01 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1207:
-

 Summary: Bootstrap flagged container set before starting 
replication manager thread
 Key: HDDS-1207
 URL: https://issues.apache.org/jira/browse/HDDS-1207
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: SCM
Reporter: Nanda kumar
Assignee: Nanda kumar


When SCM starts, before starting ReplicationManager thread we have to inspect 
all the containers and flag unhealthy ones for RMT to process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1239) Use chillmode state from ChillModeManager in ChillModePrecheck

2019-03-08 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1239:
-

 Summary: Use chillmode state from ChillModeManager in 
ChillModePrecheck
 Key: HDDS-1239
 URL: https://issues.apache.org/jira/browse/HDDS-1239
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: SCM
Affects Versions: 0.3.0
Reporter: Nanda kumar
Assignee: Nanda kumar


Currently {{ChillModePrecheck}} is instantiated in multiple places and it 
maintains its own state of chillmode. Because of this {{BlockManagerImpl}} and 
{{SCMClientProtocolServer}} listens to chillmode status event to update 
{{ChillModePrecheck}} that they maintain.

It will be easier if {{ChillModePrecheck}} queries {{SCMChillModeManager}} to 
get the current chillmode state. It will also make the code simple if 
{{SCMChillModeManager}} provides {{ChillModePrecheck}} instance, instead of 
everyone creating a new object of {{ChillModePrecheck}}.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1332) Add some logging for flaky test testStartStopDatanodeStateMachine

2019-03-27 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-1332.
---
  Resolution: Fixed
   Fix Version/s: 0.5.0
Target Version/s: 0.5.0

[~arpitagarwal], thanks for the contribution. Committed this to trunk.

> Add some logging for flaky test testStartStopDatanodeStateMachine
> -
>
> Key: HDDS-1332
> URL: https://issues.apache.org/jira/browse/HDDS-1332
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> testStartStopDatanodeStateMachine fails frequently in Jenkins. It also seems 
> to have a timing issue which may be different from the Jenkins failure.
> E.g. If I add a 10 second sleep as below I can get the test to fail 100%.
> {code}
> @@ -163,6 +163,7 @@ public void testStartStopDatanodeStateMachine() throws 
> IOException,
>  try (DatanodeStateMachine stateMachine =
>  new DatanodeStateMachine(getNewDatanodeDetails(), conf, null)) {
>stateMachine.startDaemon();
> +  Thread.sleep(10_000L);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1353) Metrics scm_pipeline_metrics_num_pipeline_creation_failed keeps increasin

2019-03-29 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1353:
-

 Summary: Metrics scm_pipeline_metrics_num_pipeline_creation_failed 
keeps increasin
 Key: HDDS-1353
 URL: https://issues.apache.org/jira/browse/HDDS-1353
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: SCM
Reporter: Nanda kumar


There is a {{BackgroundPipelineCreator}} thread in SCM which runs in a fixed 
interval and tries to create pipelines. This BackgroundPipelineCreator uses 
{{IOException}} as exit criteria (no more pipelines can be created). In each 
run of BackgroundPipelineCreator we exit when we are not able to create any 
more pipelines, i.e. when we get IOException while trying to create the 
pipeline. This means that {{scm_pipeline_metrics_num_pipeline_creation_failed}} 
value will get incremented in each run of BackgroundPipelineCreator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1368) Cleanup old ReplicationManager code from SCM

2019-04-02 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1368:
-

 Summary: Cleanup old ReplicationManager code from SCM
 Key: HDDS-1368
 URL: https://issues.apache.org/jira/browse/HDDS-1368
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: SCM
Reporter: Nanda kumar
Assignee: Nanda kumar


HDDS-1205 brings in new ReplicationManager and HDDS-1207 plugs in the new code, 
this jira is for removing the old ReplicationManager and related code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1207) Refactor Container Report Processing logic and plugin new Replication Manager

2019-04-04 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-1207.
---
   Resolution: Fixed
Fix Version/s: 0.5.0

> Refactor Container Report Processing logic and plugin new Replication Manager
> -
>
> Key: HDDS-1207
> URL: https://issues.apache.org/jira/browse/HDDS-1207
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> HDDS-1205 brings in new ReplicationManager, this Jira is to refactor 
> ContainerReportProcessing logic in SCM so that it complements 
> ReplicationManager and plugin the new ReplicationManager code. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1384) TestBlockOutputStreamWithFailures is failing

2019-04-04 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1384:
-

 Summary: TestBlockOutputStreamWithFailures is failing
 Key: HDDS-1384
 URL: https://issues.apache.org/jira/browse/HDDS-1384
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Nanda kumar


TestBlockOutputStreamWithFailures is failing with the following error

{noformat}
2019-04-04 18:52:43,240 INFO  volume.ThrottledAsyncChecker 
(ThrottledAsyncChecker.java:schedule(140)) - Scheduling a check for 
org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a
2019-04-04 18:52:43,240 INFO  volume.HddsVolumeChecker 
(HddsVolumeChecker.java:checkAllVolumes(203)) - Scheduled health check for 
volume org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a
2019-04-04 18:52:43,241 ERROR server.GrpcService 
(ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to 
start Grpc server
java.io.IOException: Failed to bind
  at 
org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253)
  at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166)
  at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81)
  at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144)
  at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
  at 
org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69)
  at 
org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300)
  at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202)
  at 
org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298)
  at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419)
  at 
org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186)
  at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169)
  at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338)
  at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.BindException: Address already in use
  at sun.nio.ch.Net.bind0(Native Method)
  at sun.nio.ch.Net.bind(Net.java:433)
  at sun.nio.ch.Net.bind(Net.java:425)
  at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
  at 
org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130)
  at 
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558)
  at 
org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358)
  at 
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501)
  at 
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486)
  at 
org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019)
  at 
org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254)
  at 
org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366)
  at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
  at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
  at 
org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462)
  at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
  at 
org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
  ... 1 more
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1387) ConcurrentModificationException in TestMiniChaosOzoneCluster

2019-04-04 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1387:
-

 Summary: ConcurrentModificationException in 
TestMiniChaosOzoneCluster
 Key: HDDS-1387
 URL: https://issues.apache.org/jira/browse/HDDS-1387
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Nanda kumar
Assignee: Nanda kumar


TestMiniChaosOzoneCluster is failing with the below exception
{noformat}
[ERROR] org.apache.hadoop.ozone.TestMiniChaosOzoneCluster  Time elapsed: 
265.679 s  <<< ERROR!
java.util.ConcurrentModificationException
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
at java.util.ArrayList$Itr.next(ArrayList.java:859)
at 
org.apache.hadoop.ozone.MiniOzoneClusterImpl.stop(MiniOzoneClusterImpl.java:350)
at 
org.apache.hadoop.ozone.MiniOzoneClusterImpl.shutdown(MiniOzoneClusterImpl.java:325)
at 
org.apache.hadoop.ozone.MiniOzoneChaosCluster.shutdown(MiniOzoneChaosCluster.java:130)
at 
org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.shutdown(TestMiniChaosOzoneCluster.java:92)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1409) TestOzoneClientRetriesOnException is flaky

2019-04-10 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1409:
-

 Summary: TestOzoneClientRetriesOnException is flaky
 Key: HDDS-1409
 URL: https://issues.apache.org/jira/browse/HDDS-1409
 Project: Hadoop Distributed Data Store
  Issue Type: Test
Reporter: Nanda kumar


TestOzoneClientRetriesOnException is flaky, we get the below exception when it 
fails.

{noformat}
[ERROR] 
testMaxRetriesByOzoneClient(org.apache.hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException)
  Time elapsed: 16.227 s  <<< FAILURE!
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException.testMaxRetriesByOzoneClient(TestOzoneClientRetriesOnException.java:197)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1410) TestSCMNodeMetrics is flaky

2019-04-10 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1410:
-

 Summary: TestSCMNodeMetrics is flaky
 Key: HDDS-1410
 URL: https://issues.apache.org/jira/browse/HDDS-1410
 Project: Hadoop Distributed Data Store
  Issue Type: Test
  Components: test
Reporter: Nanda kumar


TestSCMNodeMetrics is flaky
https://ci.anzix.net/job/ozone/16617/testReport/org.apache.hadoop.ozone.scm.node/TestSCMNodeMetrics/testNodeReportProcessing/
{noformat}

java.lang.AssertionError: Bad value for metric NumNodeReportProcessed 
expected:<2> but was:<1>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at 
org.apache.hadoop.test.MetricsAsserts.assertCounter(MetricsAsserts.java:227)
at 
org.apache.hadoop.ozone.scm.node.TestSCMNodeMetrics.testNodeReportProcessing(TestSCMNodeMetrics.java:107)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1411) Add unit test to check if SCM correctly sends close commands for containers in closing state after a restart

2019-04-10 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1411:
-

 Summary: Add unit test to check if SCM correctly sends close 
commands for containers in closing state after a restart
 Key: HDDS-1411
 URL: https://issues.apache.org/jira/browse/HDDS-1411
 Project: Hadoop Distributed Data Store
  Issue Type: Test
  Components: test
Reporter: Nanda kumar


When the container is in CLOSING state, SCM keeps sending close command to the 
datanode until the container is either moved to QUASI_CLOSED or CLOSED state. 
The frequency in which the close command is sent by SCM depends on the property 
{{hdds.scm.replication.thread.interval}}. 

We have to add a test case to verify whether SCM is sending close commands for 
containers in the closing state even after a restart.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1416) MiniOzoneCluster should set custom value for hdds.datanode.replication.work.dir

2019-04-10 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1416:
-

 Summary: MiniOzoneCluster should set custom value for 
hdds.datanode.replication.work.dir
 Key: HDDS-1416
 URL: https://issues.apache.org/jira/browse/HDDS-1416
 Project: Hadoop Distributed Data Store
  Issue Type: Test
  Components: test
Affects Versions: 0.4.0
Reporter: Nanda kumar


Datanode uses a temporary working directory for copying/replicating containers, 
the default location of this directory is read from system property 
{{java.io.tmpdir}}. Since all the datanodes are running in same machine/jvm in 
MiniOzoneCluster we might corrupt the data while the datanodes are moving 
containers as all the datanodes will be using the same working directory.

While configuring datanode for MiniOzoneCluster, we should set custom value for 
{{hdds.datanode.replication.work.dir}} in each datanode instance.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1417) After successfully importing a container, datanode should delete the container tar.gz file from working directory

2019-04-10 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1417:
-

 Summary: After successfully importing a container, datanode should 
delete the container tar.gz file from working directory
 Key: HDDS-1417
 URL: https://issues.apache.org/jira/browse/HDDS-1417
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.3.0
Reporter: Nanda kumar
Assignee: Nanda kumar


Whenever we want to replicate or copy a container from one datanode to another, 
we compress the container data and create a tar.gz file. This tar file is then 
copied from source datanode to destination datanode. In destination, we use a 
temporary working directory where this tar file is copied. Once the copying is 
complete we import the container. After importing the container we no longer 
need the tar file in the working directory of destination datanode, this has to 
be deleted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1433) Fix typo in hdds.proto

2019-04-12 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1433:
-

 Summary: Fix typo in hdds.proto
 Key: HDDS-1433
 URL: https://issues.apache.org/jira/browse/HDDS-1433
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Nanda kumar


We got a typo in hdds.proto file
- {{GetScmInfoRespsonseProto}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1434) TestDatanodeStateMachine is flaky

2019-04-12 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1434:
-

 Summary: TestDatanodeStateMachine is flaky
 Key: HDDS-1434
 URL: https://issues.apache.org/jira/browse/HDDS-1434
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Nanda kumar


TestDatanodeStateMachine is flaky.
 It has failed in the following build
 
[https://builds.apache.org/job/PreCommit-HDDS-Build/2650/artifact/out/patch-unit-hadoop-hdds.txt]
 
[https://builds.apache.org/job/hadoop-multibranch/job/PR-661/6/artifact/out/patch-unit-hadoop-hdds_container-service.txt]
 
[https://builds.apache.org/job/PreCommit-HDDS-Build/2635/artifact/out/patch-unit-hadoop-hdds.txt]

Stack trace:
{noformat}
java.lang.Thread.State: WAITING (on object monitor)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at 
java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)


at 
org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:389)
at 
org.apache.hadoop.ozone.container.common.TestDatanodeStateMachine.testStartStopDatanodeStateMachine(TestDatanodeStateMachine.java:166)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Errors: 
[ERROR]   TestDatanodeStateMachine.testStartStopDatanodeStateMachine:166 ? 
Timeout Timed...
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1488) Scm cli command to start/stop replication manager

2019-05-02 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1488:
-

 Summary: Scm cli command to start/stop replication manager
 Key: HDDS-1488
 URL: https://issues.apache.org/jira/browse/HDDS-1488
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: SCM
Reporter: Nanda kumar
Assignee: Nanda kumar


It would be nice to have scmcli command to start/stop the ReplicationManager 
thread running in SCM



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1201) Reporting Corruptions in Containers to SCM

2019-06-06 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-1201.
---
   Resolution: Fixed
Fix Version/s: 0.4.1
   0.5.0

> Reporting Corruptions in Containers to SCM
> --
>
> Key: HDDS-1201
> URL: https://issues.apache.org/jira/browse/HDDS-1201
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode, SCM
>Reporter: Supratim Deka
>Assignee: Shweta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0, 0.4.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Add protocol message and handling to report container corruptions to the SCM.
> Also add basic recovery handling in SCM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1647) Recon config tag does not show up on Ozone UI.

2019-06-06 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-1647.
---
Resolution: Fixed

> Recon config tag does not show up on Ozone UI.
> --
>
> Key: HDDS-1647
> URL: https://issues.apache.org/jira/browse/HDDS-1647
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Aravindan Vijayan
>Assignee: Aravindan Vijayan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
> Attachments: HDDS-1647-000.patch, Screen Shot 2019-06-05 at 10.02.59 
> AM.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Recon tag does not show up on the list of tags on /conf page. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1652) HddsDispatcher should not shutdown volumeSet

2019-06-06 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-1652.
---
   Resolution: Fixed
Fix Version/s: 0.4.1
   0.5.0

> HddsDispatcher should not shutdown volumeSet
> 
>
> Key: HDDS-1652
> URL: https://issues.apache.org/jira/browse/HDDS-1652
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0, 0.4.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently both OzoneContainer#stop() and HddsDispatcher#stop() both invoke 
> volumeSet.shutdown() explicitly to shutdown the same volume set.
>  
> In addition, OzoneContainer#stop() will invoke HddsDispatcher#stop(). Since 
> the volume set object is created by OzoneContainer object, it should be the 
> responsibility of OzoneContainer to shutdown. This ticket is opened to remove 
> the volumeSet.shutdown() from HddsDispatcher#stop(). 
>  
> There are benchmark tools relies on HddsDispatcher#stop() to shutdown 
> volumeSet object, that we could fix with explict volumeSet#shutdown call. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1650) Fix Ozone tests leaking volume checker thread

2019-06-06 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-1650.
---
   Resolution: Fixed
Fix Version/s: 0.4.1
   0.5.0

> Fix Ozone tests leaking volume checker thread
> -
>
> Key: HDDS-1650
> URL: https://issues.apache.org/jira/browse/HDDS-1650
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0, 0.4.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There are a few test leaking hdds volume checker thread. This ticket is 
> opened to fix them. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1454) GC other system pause events can trigger pipeline destroy for all the nodes in the cluster

2019-06-19 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-1454.
---
   Resolution: Fixed
Fix Version/s: 0.5.0

> GC other system pause events can trigger pipeline destroy for all the nodes 
> in the cluster
> --
>
> Key: HDDS-1454
> URL: https://issues.apache.org/jira/browse/HDDS-1454
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Mukul Kumar Singh
>Assignee: Supratim Deka
>Priority: Major
>  Labels: MiniOzoneChaosCluster, pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> In a MiniOzoneChaosCluster run it was observed that events like GC pauses or 
> any other pauses in SCM can mark all the datanodes as stale in SCM. This will 
> trigger multiple pipeline destroy and will render the system unusable. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1759) TestWatchForCommit crashes

2019-07-02 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1759:
-

 Summary: TestWatchForCommit crashes
 Key: HDDS-1759
 URL: https://issues.apache.org/jira/browse/HDDS-1759
 Project: Hadoop Distributed Data Store
  Issue Type: Test
  Components: test
Reporter: Nanda kumar


{{org.apache.hadoop.ozone.client.rpc.TestWatchForCommit}} is crashing with the 
following exception trace.
{noformat}
[ERROR] Crashed tests:
[ERROR] org.apache.hadoop.ozone.client.rpc.TestWatchForCommit
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: 
ExecutionException The forked VM terminated without properly saying goodbye. VM 
crash or System.exit called?
[ERROR] Command was /bin/sh -c cd 
/Users/nvadivelu/codebase/apache/hadoop/hadoop-ozone/integration-test && 
/Library/Java/JavaVirtualMachines/jdk1.8.0_152.jdk/Contents/Home/jre/bin/java 
-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError -jar 
/Users/nvadivelu/codebase/apache/hadoop/hadoop-ozone/integration-test/target/surefire/surefirebooter6824244130326461346.jar
 
/Users/nvadivelu/codebase/apache/hadoop/hadoop-ozone/integration-test/target/surefire
 2019-07-03T10-47-23_862-jvmRun1 surefire1503013258446082728tmp 
surefire_07547129263746053478tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 1
[ERROR] Crashed tests:
[ERROR] org.apache.hadoop.ozone.client.rpc.TestWatchForCommit
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:511)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:458)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:299)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:247)
[ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1149)
[ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:991)
[ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:837)
[ERROR] at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
[ERROR] at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
[ERROR] at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
[ERROR] at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
[ERROR] at 
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
[ERROR] at 
org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
[ERROR] at 
org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:955)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:290)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:194)
[ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.lang.reflect.Method.invoke(Method.java:498)
[ERROR] at 
org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
[ERROR] at 
org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
[ERROR] at 
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
[ERROR] at 
org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[ERROR] Caused by: 
org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM 
terminated without properly saying goodbye. VM crash or System.exit called?
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1770) SCM crashes when ReplicationManager is trying to re-replicate under replicated containers

2019-07-08 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1770:
-

 Summary: SCM crashes when ReplicationManager is trying to 
re-replicate under replicated containers
 Key: HDDS-1770
 URL: https://issues.apache.org/jira/browse/HDDS-1770
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Reporter: Nanda kumar


SCM crashes with the following exception when ReplicationManager is trying to 
re-replicate under replicated containers
{noformat}
2019-07-08 12:46:36 ERROR ReplicationManager:215 - Exception in Replication 
Monitor Thread.
java.lang.IllegalArgumentException: Affinity node /default-rack/aab15e2d07cc is 
not a member of topology
at 
org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.checkAffinityNode(NetworkTopologyImpl.java:767)
at 
org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.chooseRandom(NetworkTopologyImpl.java:407)
at 
org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseNode(SCMContainerPlacementRackAware.java:242)
at 
org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseDatanodes(SCMContainerPlacementRackAware.java:168)
at 
org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:487)
at 
org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:293)
at 
java.base/java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4698)
at 
java.base/java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1083)
at 
org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:205)
at java.base/java.lang.Thread.run(Thread.java:834)
2019-07-08 12:46:36 INFO  ExitUtil:210 - Exiting with status 1: 
java.lang.IllegalArgumentException: Affinity node /default-rack/aab15e2d07cc is 
not a member of topology
2019-07-08 12:46:36 INFO  StorageContainerManagerStarter:51 - SHUTDOWN_MSG: 
/
SHUTDOWN_MSG: Shutting down StorageContainerManager at 
8c763563f672/192.168.112.2
/
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1778) Fix existing blockade tests

2019-07-10 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1778:
-

 Summary: Fix existing blockade tests
 Key: HDDS-1778
 URL: https://issues.apache.org/jira/browse/HDDS-1778
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Nanda kumar
Assignee: Nanda kumar


This jira is to track and fix existing blockade test cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1201) Reporting Corruptions in Containers to SCM

2019-07-11 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-1201.
---
Resolution: Fixed

> Reporting Corruptions in Containers to SCM
> --
>
> Key: HDDS-1201
> URL: https://issues.apache.org/jira/browse/HDDS-1201
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode, SCM
>Reporter: Supratim Deka
>Assignee: Hrishikesh Gadre
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Add protocol message and handling to report container corruptions to the SCM.
> Also add basic recovery handling in SCM.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1784) Missing HostName and IpAddress in the response of register command

2019-07-11 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1784:
-

 Summary: Missing HostName and IpAddress in the response of 
register command
 Key: HDDS-1784
 URL: https://issues.apache.org/jira/browse/HDDS-1784
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Reporter: Nanda kumar
Assignee: Nanda kumar


{{SCMNodeManager}} sets the HostName and IpAddress to the response of register 
command, but that is being ignored in {{SCMDatanodeProtocolServer}} while 
sending the response back to the datanode.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1754) getContainerWithPipeline fails with PipelineNotFoundException

2019-07-11 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-1754.
---
   Resolution: Fixed
Fix Version/s: 0.5.0

> getContainerWithPipeline fails with PipelineNotFoundException
> -
>
> Key: HDDS-1754
> URL: https://issues.apache.org/jira/browse/HDDS-1754
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Supratim Deka
>Priority: Major
>  Labels: MiniOzoneChaosCluster, pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Once a pipeline is closed or finalized and it was not able to close all the 
> containers inside the pipeline. 
> Then getContainerWithPipeline will try to fetch the pipeline state from 
> pipelineManager after the pipeline has been closed.
> {code}
> 2019-07-02 20:48:20,370 INFO  ipc.Server (Server.java:logException(2726)) - 
> IPC Server handler 13 on 50130, call Call#17339 Retry#0 
> org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocol.getContainerWithPipeline
>  from 192.168.0.2:51452
> org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: 
> PipelineID=e1a7b16a-48d9-4194-9774-ad49ec9ad78b not found
> at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getPipeline(PipelineStateMap.java:132)
> at 
> org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.getPipeline(PipelineStateManager.java:66)
> at 
> org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.getPipeline(SCMPipelineManager.java:184)
> at 
> org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer.getContainerWithPipeline(SCMClientProtocolServer.java:244)
> at 
> org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.getContainerWithPipeline(StorageContainerLocationProtocolServerSideTranslatorPB.java:144)
> at 
> org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:16390)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1752) ConcurrentModificationException while handling DeadNodeHandler event

2019-07-12 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-1752.
---
   Resolution: Fixed
Fix Version/s: 0.4.1

Thanks [~hgadre] for the contribution and thanks to [~msingh] for reporting it. 
Committed it to trunk and ozone-0.4.1 branch.

> ConcurrentModificationException while handling DeadNodeHandler event
> 
>
> Key: HDDS-1752
> URL: https://issues.apache.org/jira/browse/HDDS-1752
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Assignee: Hrishikesh Gadre
>Priority: Major
>  Labels: MiniOzoneChaosCluster, pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ConcurrentModificationException while handling DeadNodeHandler event
> {code}
> 2019-07-02 19:29:25,190 ERROR events.SingleThreadExecutor 
> (SingleThreadExecutor.java:lambda$onMessage$1(88)) - Error on execution 
> message 56591ec5-c9e4-416c-9a36-db0507739fe5{ip: 192.168.0.2, host: 192.16
> 8.0.2, networkLocation: /default-rack, certSerialId: null}
> java.util.ConcurrentModificationException
> at java.util.HashMap$HashIterator.nextNode(HashMap.java:1442)
> at java.util.HashMap$KeyIterator.next(HashMap.java:1466)
> at java.lang.Iterable.forEach(Iterable.java:74)
> at 
> java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)
> at 
> org.apache.hadoop.hdds.scm.node.DeadNodeHandler.lambda$destroyPipelines$1(DeadNodeHandler.java:99)
> at java.util.Optional.ifPresent(Optional.java:159)
> at 
> org.apache.hadoop.hdds.scm.node.DeadNodeHandler.destroyPipelines(DeadNodeHandler.java:98)
> at 
> org.apache.hadoop.hdds.scm.node.DeadNodeHandler.onMessage(DeadNodeHandler.java:78)
> at 
> org.apache.hadoop.hdds.scm.node.DeadNodeHandler.onMessage(DeadNodeHandler.java:44)
> at 
> org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1790) Fix checkstyle issues in TestDataScrubber

2019-07-12 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1790:
-

 Summary: Fix checkstyle issues in TestDataScrubber
 Key: HDDS-1790
 URL: https://issues.apache.org/jira/browse/HDDS-1790
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Nanda kumar
Assignee: Nanda kumar


There are 4 Checkstyle issues in TestDataScrubber that has to be fixed
{noformat}
[ERROR] 
src/test/java/org/apache/hadoop/ozone/dn/scrubber/TestDataScrubber.java:[157] 
(sizes) LineLength: Line is longer than 80 characters (found 81).
[ERROR] 
src/test/java/org/apache/hadoop/ozone/dn/scrubber/TestDataScrubber.java:[161] 
(sizes) LineLength: Line is longer than 80 characters (found 82).
[ERROR] 
src/test/java/org/apache/hadoop/ozone/dn/scrubber/TestDataScrubber.java:[167] 
(sizes) LineLength: Line is longer than 80 characters (found 85).
[ERROR] 
src/test/java/org/apache/hadoop/ozone/dn/scrubber/TestDataScrubber.java:[187] 
(sizes) LineLength: Line is longer than 80 characters (found 104).
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1791) Update network-tests/src/test/blockade/README.md file

2019-07-12 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1791:
-

 Summary: Update network-tests/src/test/blockade/README.md file
 Key: HDDS-1791
 URL: https://issues.apache.org/jira/browse/HDDS-1791
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: test
Reporter: Nanda kumar
Assignee: Nanda kumar


{{hadoop-ozone/fault-injection-test/network-tests/src/test/blockade/README.md}} 
has to be updated after HDDS-1778.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1759) TestWatchForCommit crashes

2019-07-12 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-1759.
---
Resolution: Duplicate

> TestWatchForCommit crashes
> --
>
> Key: HDDS-1759
> URL: https://issues.apache.org/jira/browse/HDDS-1759
> Project: Hadoop Distributed Data Store
>  Issue Type: Test
>  Components: test
>Reporter: Nanda kumar
>Priority: Major
>
> {{org.apache.hadoop.ozone.client.rpc.TestWatchForCommit}} is crashing with 
> the following exception trace.
> {noformat}
> [ERROR] Crashed tests:
> [ERROR] org.apache.hadoop.ozone.client.rpc.TestWatchForCommit
> [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: 
> ExecutionException The forked VM terminated without properly saying goodbye. 
> VM crash or System.exit called?
> [ERROR] Command was /bin/sh -c cd 
> /Users/nvadivelu/codebase/apache/hadoop/hadoop-ozone/integration-test && 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_152.jdk/Contents/Home/jre/bin/java 
> -Xmx2048m -XX:+HeapDumpOnOutOfMemoryError -jar 
> /Users/nvadivelu/codebase/apache/hadoop/hadoop-ozone/integration-test/target/surefire/surefirebooter6824244130326461346.jar
>  
> /Users/nvadivelu/codebase/apache/hadoop/hadoop-ozone/integration-test/target/surefire
>  2019-07-03T10-47-23_862-jvmRun1 surefire1503013258446082728tmp 
> surefire_07547129263746053478tmp
> [ERROR] Error occurred in starting fork, check output in log
> [ERROR] Process Exit Code: 1
> [ERROR] Crashed tests:
> [ERROR] org.apache.hadoop.ozone.client.rpc.TestWatchForCommit
> [ERROR]   at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:511)
> [ERROR]   at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:458)
> [ERROR]   at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:299)
> [ERROR]   at 
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:247)
> [ERROR]   at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1149)
> [ERROR]   at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:991)
> [ERROR]   at 
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:837)
> [ERROR]   at 
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
> [ERROR]   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
> [ERROR]   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
> [ERROR]   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
> [ERROR]   at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
> [ERROR]   at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
> [ERROR]   at 
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
> [ERROR]   at 
> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
> [ERROR]   at 
> org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
> [ERROR]   at 
> org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
> [ERROR]   at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
> [ERROR]   at org.apache.maven.cli.MavenCli.execute(MavenCli.java:955)
> [ERROR]   at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:290)
> [ERROR]   at org.apache.maven.cli.MavenCli.main(MavenCli.java:194)
> [ERROR]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [ERROR]   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> [ERROR]   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [ERROR]   at java.lang.reflect.Method.invoke(Method.java:498)
> [ERROR]   at 
> org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
> [ERROR]   at 
> org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
> [ERROR]   at 
> org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
> [ERROR]   at 
> org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
> [ERROR] Caused by: 
> org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM 
> terminated without properly saying goodbye. VM crash or System.exit called?
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-

[jira] [Resolved] (HDDS-1036) container replica state in datanode should be QUASI-CLOSED if the datanode is isolated from other two datanodes in 3 datanode cluster

2019-07-15 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-1036.
---
Resolution: Not A Problem

Fixed as part of ReplicationManager refactoring.

> container replica state in datanode should be QUASI-CLOSED if the datanode is 
> isolated from other two datanodes in 3 datanode cluster
> -
>
> Key: HDDS-1036
> URL: https://issues.apache.org/jira/browse/HDDS-1036
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode, SCM
>Reporter: Nilotpal Nandi
>Assignee: Nanda kumar
>Priority: Major
>
> steps taken :
> ---
>  # created a 3 datanode docker cluster.
>  # wrote some data to create a pipeline.
>  # Then, one of the datanodes is isolated from other two datanodes. All 
> datanodes can communicate with SCM.
>  # Tried to write new data , write failed.
>  # Wait for 900 seconds.
> Observation:
> 
> container state is CLOSED in all three replicas.
>  
> Expectation:
> ---
> container state in isolated datanode should be QUASI-CLOSED.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1810) SCM command to Activate and Deactivate piplines

2019-07-16 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1810:
-

 Summary: SCM command to Activate and Deactivate piplines
 Key: HDDS-1810
 URL: https://issues.apache.org/jira/browse/HDDS-1810
 Project: Hadoop Distributed Data Store
  Issue Type: New Feature
  Components: SCM, SCM Client
Reporter: Nanda kumar
Assignee: Nanda kumar


It will be useful to have scm command to temporarily deactivate and re-activate 
a pipeline. This will help us a lot in debugging a pipeline.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1817) GetKey fails with IllegalArgumentException

2019-07-17 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1817:
-

 Summary: GetKey fails with IllegalArgumentException
 Key: HDDS-1817
 URL: https://issues.apache.org/jira/browse/HDDS-1817
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client, SCM
Affects Versions: 0.4.0
Reporter: Nanda kumar


During get key call the client is intermittently failing with 
{{java.lang.IllegalArgumentException}}
{noformat}
E   AssertionError: Ozone get Key failed with 
output=[java.lang.IllegalArgumentException
E   at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
E   at 
org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:150)
E   at 
org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClientForReadData(XceiverClientManager.java:143)
E   at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:154)
E   at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:118)
E   at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:222)
E   at 
org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171)
E   at 
org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47)
E   at java.base/java.io.InputStream.read(InputStream.java:205)
E   at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:94)
E   at 
org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:98)
E   at 
org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:48)
E   at picocli.CommandLine.execute(CommandLine.java:1173)
E   at picocli.CommandLine.access$800(CommandLine.java:141)
E   at picocli.CommandLine$RunLast.handle(CommandLine.java:1367)
E   at picocli.CommandLine$RunLast.handle(CommandLine.java:1335)
E   at 
picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243)
E   at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526)
E   at picocli.CommandLine.parseWithHandler(CommandLine.java:1465)
E   at 
org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65)
E   at 
org.apache.hadoop.ozone.web.ozShell.OzoneShell.execute(OzoneShell.java:60)
E   at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56)
E   at 
org.apache.hadoop.ozone.web.ozShell.OzoneShell.main(OzoneShell.java:53)]
{noformat}

This is happening when the pipeline returned by SCM doesn't have any datanode 
information.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1821) BlockOutputStream#watchForCommit fails with UnsupportedOperationException when one DN is down

2019-07-18 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1821:
-

 Summary: BlockOutputStream#watchForCommit fails with 
UnsupportedOperationException when one DN is down
 Key: HDDS-1821
 URL: https://issues.apache.org/jira/browse/HDDS-1821
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Reporter: Nanda kumar


When one of the datanode from the ratis pipeline is excluded by introducing 
network failure, the client write is failing with the following exception
{noformat}
2019-07-18 07:13:33 WARN  XceiverClientRatis:262 - 3 way commit failed on 
pipeline Pipeline[ Id: b338512c-1a3b-4ae6-b89c-7b7737d9bd93, Nodes: 
ce90cf89-0444-45bf-8c49-a126d8da5a5f{ip: 192.168.240.4, host: 
ozoneblockade_datanode_2.ozoneblockade_default, networkLocation: /default-rack, 
certSerialId: null}fa65a457-155d-4bf3-8d1b-b0e11ec157ae{ip: 192.168.240.6, 
host: ozoneblockade_datanode_3.ozoneblockade_default, networkLocation: 
/default-rack, certSerialId: null}c5785c99-7dc2-4afc-9054-2efa28a41e7e{ip: 
192.168.240.2, host: ozoneblockade_datanode_1.ozoneblockade_default, 
networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:THREE, 
State:OPEN]
E java.util.concurrent.ExecutionException: 
org.apache.ratis.protocol.NotReplicatedException: Request with call Id 2 and 
log index 9 is not yet replicated to ALL_COMMITTED
E   at 
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
E   at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2022)
E   at 
org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:259)
E   at 
org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:194)
E   at 
org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchOnLastIndex(CommitWatcher.java:157)
E   at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:348)
E   at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:480)
E   at 
org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:494)
E   at 
org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143)
E   at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:434)
E   at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:472)
E   at 
org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:60)
E   at 
org.apache.hadoop.ozone.freon.RandomKeyGenerator.createKey(RandomKeyGenerator.java:706)
E   at 
org.apache.hadoop.ozone.freon.RandomKeyGenerator.access$1100(RandomKeyGenerator.java:88)
E   at 
org.apache.hadoop.ozone.freon.RandomKeyGenerator$ObjectCreator.run(RandomKeyGenerator.java:609)
E   at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
E   at 
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
E   at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
E   at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
E   at java.base/java.lang.Thread.run(Thread.java:834)
E Caused by: org.apache.ratis.protocol.NotReplicatedException: Request 
with call Id 2 and log index 9 is not yet replicated to ALL_COMMITTED
E   at 
org.apache.ratis.client.impl.ClientProtoUtils.toRaftClientReply(ClientProtoUtils.java:245)
E   at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:254)
E   at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:249)
E   at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:421)
E   at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
E   at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
E   at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:519)
E   at 
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
E   at 
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
E   ... 3 more
E 2019-07-18 07:13:33 INFO  XceiverClientRatis:280 - Cou

[jira] [Created] (HDDS-1850) ReplicationManager should consider inflight replication and deletion while picking datanode for re-replication

2019-07-23 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1850:
-

 Summary: ReplicationManager should consider inflight replication 
and deletion while picking datanode for re-replication
 Key: HDDS-1850
 URL: https://issues.apache.org/jira/browse/HDDS-1850
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Reporter: Nanda kumar
Assignee: Nanda kumar


When choosing the target datanode for re-replication {{ReplicationManager}} 
should consider the datanodes which are in inflight replication and deletion 
for the same container.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1851) ReplicationManager should not force close a container with one quasi-closed replica

2019-07-23 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1851:
-

 Summary: ReplicationManager should not force close a container 
with one quasi-closed replica
 Key: HDDS-1851
 URL: https://issues.apache.org/jira/browse/HDDS-1851
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Reporter: Nanda kumar
Assignee: Nanda kumar


There is a case in {{ReplicationManager}} where we go ahead and close a 
quasi-closed container which has only one quasi-closed replica. We should not 
do this. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1853) Fix failing blockade test-cases

2019-07-23 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1853:
-

 Summary: Fix failing blockade test-cases
 Key: HDDS-1853
 URL: https://issues.apache.org/jira/browse/HDDS-1853
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Nanda kumar
Assignee: Nanda kumar


This Jira is to fix and make sure that all the test-cases in blockade are 
working.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1854) Print intuitive error message at client when the pipeline returned by SCM has no datanode

2019-07-24 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1854:
-

 Summary: Print intuitive error message at client when the pipeline 
returned by SCM has no datanode
 Key: HDDS-1854
 URL: https://issues.apache.org/jira/browse/HDDS-1854
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone Client
Reporter: Nanda kumar


We are throwing {{IllegalArgumentException}} in OzoneClient when the pipeline 
returned by SCM doesn't have any datanode information. Instead of throwing 
{{IllegalArgumentException}}, we can throw custom user friendly exception which 
is easy to understand.
Existing exception trace:
{noformat}
AssertionError: Ozone get Key failed with 
output=[java.lang.IllegalArgumentException
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
at 
org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:150)
at 
org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClientForReadData(XceiverClientManager.java:143)
at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:154)
at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:118)
at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:222)
at 
org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171)
at 
org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47)
at java.base/java.io.InputStream.read(InputStream.java:205)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:94)
at 
org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:98)
at 
org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:48)
at picocli.CommandLine.execute(CommandLine.java:1173)
at picocli.CommandLine.access$800(CommandLine.java:141)
at picocli.CommandLine$RunLast.handle(CommandLine.java:1367)
at picocli.CommandLine$RunLast.handle(CommandLine.java:1335)
at 
picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243)
at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526)
at picocli.CommandLine.parseWithHandler(CommandLine.java:1465)
at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65)
at 
org.apache.hadoop.ozone.web.ozShell.OzoneShell.execute(OzoneShell.java:60)
at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56)
at 
org.apache.hadoop.ozone.web.ozShell.OzoneShell.main(OzoneShell.java:53)]
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1855) TestStorageContainerManager#testScmProcessDatanodeHeartbeat is failing

2019-07-24 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1855:
-

 Summary: 
TestStorageContainerManager#testScmProcessDatanodeHeartbeat is failing
 Key: HDDS-1855
 URL: https://issues.apache.org/jira/browse/HDDS-1855
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Nanda kumar
Assignee: Nanda kumar


{{TestStorageContainerManager#testScmProcessDatanodeHeartbeat}} is failing with 
the following exception

{noformat}
[ERROR] Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 106.315 
s <<< FAILURE! - in org.apache.hadoop.ozone.TestStorageContainerManager
[ERROR] 
testScmProcessDatanodeHeartbeat(org.apache.hadoop.ozone.TestStorageContainerManager)
  Time elapsed: 21.97 s  <<< FAILURE!
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.ozone.TestStorageContainerManager.testScmProcessDatanodeHeartbeat(TestStorageContainerManager.java:531)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1851) ReplicationManager should not force close a container with one quasi-closed replica

2019-07-25 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-1851.
---
Resolution: Not A Problem

> ReplicationManager should not force close a container with one quasi-closed 
> replica
> ---
>
> Key: HDDS-1851
> URL: https://issues.apache.org/jira/browse/HDDS-1851
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
>
> There is a case in {{ReplicationManager}} where we go ahead and close a 
> quasi-closed container which has only one quasi-closed replica. We should not 
> do this. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1887) Enable all the blockade test-cases

2019-08-01 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1887:
-

 Summary: Enable all the blockade test-cases
 Key: HDDS-1887
 URL: https://issues.apache.org/jira/browse/HDDS-1887
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: test
Reporter: Nanda kumar
Assignee: Nanda kumar


Some of the blockade tests were {{Ignored}} because of open issues, since most 
of the issues are resolved we can go ahead and enable all the ignored blockade 
test-cases.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1888) Add containers to node2container map in SCM as soon as a container is allocated

2019-08-01 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1888:
-

 Summary: Add containers to node2container map in SCM as soon as a 
container is allocated
 Key: HDDS-1888
 URL: https://issues.apache.org/jira/browse/HDDS-1888
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Reporter: Nanda kumar
Assignee: Nanda kumar


In SCM node2container and node2pipeline maps are managed by NodeManager and 
pipeline2container map is managed by PipelineManager.
Currently, when a container is allocated in SCM, it is added to 
pipeline2container map and we are not adding it to node2container map. We 
update the node2container map only when the datanode sends full container 
report.

When a node is marked as dead, DeadNodeHandler processes the event and it gets 
the list of containers that are hosted by the dead datanode and updates the 
respective container replica state in ContainerManager. The list of containers 
on the datanode is read from node2container map, this map will be missing 
containers which are created recently (after the last container report). In 
such cases we not be able to remove the container replica information for those 
containers. In reality, these containers are under replicated, but SCM will 
never know.

We should add containers to node2container map in SCM as soon as a container is 
allocated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1882) TestReplicationManager failed with NPE in ReplicationManager.java

2019-08-01 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-1882.
---
  Resolution: Fixed
   Fix Version/s: 0.4.1
Target Version/s: 0.4.1

Thanks [~Sammi] for the contribution. Committed this to trunk and ozone-0.4.1 
branch.

> TestReplicationManager failed with NPE in ReplicationManager.java 
> --
>
> Key: HDDS-1882
> URL: https://issues.apache.org/jira/browse/HDDS-1882
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Sammi Chen
>Assignee: Sammi Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.4.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1902) Fix checkstyle issues in ContainerStateMachine

2019-08-03 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1902:
-

 Summary: Fix checkstyle issues in ContainerStateMachine
 Key: HDDS-1902
 URL: https://issues.apache.org/jira/browse/HDDS-1902
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Nanda kumar
Assignee: Nanda kumar


Fix checkstyle issues in ContainerStateMachine:
Line is longer than 80 characters (found 85).




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1903) Use dynamic ports for SCM in TestSCMClientProtocolServer and TestSCMSecurityProtocolServer

2019-08-03 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1903:
-

 Summary: Use dynamic ports for SCM in TestSCMClientProtocolServer 
and TestSCMSecurityProtocolServer
 Key: HDDS-1903
 URL: https://issues.apache.org/jira/browse/HDDS-1903
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Nanda kumar


We should use dynamic port for SCM in the following test-cases
* TestSCMClientProtocolServer
* TestSCMSecurityProtocolServer



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1902) Fix checkstyle issues in ContainerStateMachine

2019-08-04 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-1902.
---
Resolution: Duplicate

> Fix checkstyle issues in ContainerStateMachine
> --
>
> Key: HDDS-1902
> URL: https://issues.apache.org/jira/browse/HDDS-1902
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Minor
>
> Fix checkstyle issues in ContainerStateMachine:
> Line is longer than 80 characters (found 85).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1904) SCM cli: group container and pipeline related commands to separate subcommands

2019-08-04 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1904:
-

 Summary: SCM cli: group container and pipeline related commands to 
separate subcommands
 Key: HDDS-1904
 URL: https://issues.apache.org/jira/browse/HDDS-1904
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: SCM Client
Reporter: Nanda kumar


In SCM CLI we have commands for container and pipelines, it would be 
easy/intuitive to group these commands to container and pipeline subcommands.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1905) PipelineActionHandler is not closing the pipeline when close action is received

2019-08-04 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1905:
-

 Summary: PipelineActionHandler is not closing the pipeline when 
close action is received
 Key: HDDS-1905
 URL: https://issues.apache.org/jira/browse/HDDS-1905
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM
Reporter: Nanda kumar
Assignee: Nanda kumar


PipelineActionHandler is not closing the pipeline when close action is received.
The bug was introduced as part of HDDS-1832 change.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1906) TestScmSafeMode#testSCMSafeModeRestrictedOp is failing

2019-08-04 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1906:
-

 Summary: TestScmSafeMode#testSCMSafeModeRestrictedOp is failing
 Key: HDDS-1906
 URL: https://issues.apache.org/jira/browse/HDDS-1906
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Nanda kumar


{noformat}
[ERROR] testSCMSafeModeRestrictedOp(org.apache.hadoop.ozone.om.TestScmSafeMode) 
 Time elapsed: 19.316 s  <<< FAILURE!
java.lang.AssertionError: Expected a 
org.apache.hadoop.hdds.scm.exceptions.SCMException to be thrown, but got the 
result: : ContainerInfo{id=1, state=OPEN, 
pipelineID=PipelineID=100fb566-2cc0-44d6-9897-e688af5c447f, 
stateEnterTime=137318188, owner=5c69dc7b-2a6b-4650-a625-a63117c11d2d} | 
Pipeline[ Id: 100fb566-2cc0-44d6-9897-e688af5c447f, Nodes: 
b91596ea-34ed-4628-a027-a1cdf05095be{ip: 127.0.0.1, host: localhost, 
networkLocation: /default-rack, certSerialId: null}, Type:STAND_ALONE, 
Factor:ONE, State:OPEN]
at 
org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:492)
at 
org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:377)
at 
org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:446)
at 
org.apache.hadoop.ozone.om.TestScmSafeMode.testSCMSafeModeRestrictedOp(TestScmSafeMode.java:331)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1907) TestOzoneRpcClientWithRatis is failing with ACL errors

2019-08-04 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1907:
-

 Summary: TestOzoneRpcClientWithRatis is failing with ACL errors
 Key: HDDS-1907
 URL: https://issues.apache.org/jira/browse/HDDS-1907
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Nanda kumar


{noformat}
[ERROR] 
testNativeAclsForKey(org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis)
  Time elapsed: 0.176 s  <<< FAILURE!
java.lang.AssertionError: Current acls:,[user:nvadivelu:a[ACCESS], 
group:staff:a[ACCESS], group:everyone:a[ACCESS], group:localaccounts:a[ACCESS], 
group:_appserverusr:a[ACCESS], group:admin:a[ACCESS], 
group:_appserveradm:a[ACCESS], group:_lpadmin:a[ACCESS], 
group:com.apple.sharepoint.group.1:a[ACCESS], 
group:com.apple.sharepoint.group.2:a[ACCESS], group:_appstore:a[ACCESS], 
group:_lpoperator:a[ACCESS], group:_developer:a[ACCESS], 
group:_analyticsusers:a[ACCESS], group:com.apple.access_ftp:a[ACCESS], 
group:com.apple.access_screensharing:a[ACCESS], 
group:com.apple.access_ssh:a[ACCESS], 
group:com.apple.sharepoint.group.3:a[ACCESS]] 
inheritedUserAcl:user:remoteUser:r[ACCESS]

[ERROR] 
testNativeAclsForBucket(org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis)
  Time elapsed: 0.074 s  <<< FAILURE!
java.lang.AssertionError

[ERROR] 
testNativeAclsForPrefix(org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis)
  Time elapsed: 0.061 s  <<< FAILURE!
java.lang.AssertionError: Current acls:,[user:nvadivelu:a[ACCESS], 
group:staff:a[ACCESS], group:everyone:a[ACCESS], group:localaccounts:a[ACCESS], 
group:_appserverusr:a[ACCESS], group:admin:a[ACCESS], 
group:_appserveradm:a[ACCESS], group:_lpadmin:a[ACCESS], 
group:com.apple.sharepoint.group.1:a[ACCESS], 
group:com.apple.sharepoint.group.2:a[ACCESS], group:_appstore:a[ACCESS], 
group:_lpoperator:a[ACCESS], group:_developer:a[ACCESS], 
group:_analyticsusers:a[ACCESS], group:com.apple.access_ftp:a[ACCESS], 
group:com.apple.access_screensharing:a[ACCESS], 
group:com.apple.access_ssh:a[ACCESS], 
group:com.apple.sharepoint.group.3:a[ACCESS]] 
inheritedUserAcl:user:remoteUser:r[ACCESS]
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1908) TestMultiBlockWritesWithDnFailures is failing

2019-08-04 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1908:
-

 Summary: TestMultiBlockWritesWithDnFailures is failing
 Key: HDDS-1908
 URL: https://issues.apache.org/jira/browse/HDDS-1908
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Nanda kumar


TestMultiBlockWritesWithDnFailures is failing with the following exception
{noformat}
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 30.992 
s <<< FAILURE! - in 
org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures
[ERROR] 
testMultiBlockWritesWithDnFailures(org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures)
  Time elapsed: 30.941 s  <<< ERROR!
INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Allocated 0 
blocks. Requested 1 blocks
at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:720)
at 
org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:752)
at 
org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateNewBlock(BlockOutputStreamEntryPool.java:248)
at 
org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateBlockIfNeeded(BlockOutputStreamEntryPool.java:296)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:201)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleRetry(KeyOutputStream.java:376)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleException(KeyOutputStream.java:325)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:231)
at 
org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:193)
at 
org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49)
at java.io.OutputStream.write(OutputStream.java:75)
at 
org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures.testMultiBlockWritesWithDnFailures(TestMultiBlockWritesWithDnFailures.java:144)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1917) Ignore failing test-cases in TestSecureOzoneRpcClient

2019-08-06 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1917:
-

 Summary: Ignore failing test-cases in TestSecureOzoneRpcClient
 Key: HDDS-1917
 URL: https://issues.apache.org/jira/browse/HDDS-1917
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Nanda kumar
Assignee: Nanda kumar


Ignore failing test-cases in TestSecureOzoneRpcClient. This will be fixed when 
HA support is added to acl operations.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1952) TestMiniChaosOzoneCluster may run until OOME

2019-08-13 Thread Nanda kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-1952.
---
   Resolution: Fixed
Fix Version/s: 0.5.0
   0.4.1

Thanks [~adoroszlai] for the contribution. Committed this to trunk and 
ozone-0.4.1 branch.

> TestMiniChaosOzoneCluster may run until OOME
> 
>
> Key: HDDS-1952
> URL: https://issues.apache.org/jira/browse/HDDS-1952
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: test
>Reporter: Doroszlai, Attila
>Assignee: Doroszlai, Attila
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.4.1, 0.5.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {{TestMiniChaosOzoneCluster}} runs load generator on a cluster for supposedly 
> 1 minute, but it may run indefinitely until JVM crashes due to 
> OutOfMemoryError.
> In 0.4.1 nightly build it crashed 29/30 times (and no tests were executed in 
> the remaining one run due to some other error).
> Latest:
> https://github.com/elek/ozone-ci/blob/3f553ed6ad358ba61a302967617de737d7fea01a/byscane/byscane-nightly-wggqd/integration/output.log#L5661-L5662
> When it crashes, it leaves GBs of data lying around.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1961) TestStorageContainerManager#testScmProcessDatanodeHeartbeat is flaky

2019-08-13 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1961:
-

 Summary: 
TestStorageContainerManager#testScmProcessDatanodeHeartbeat is flaky
 Key: HDDS-1961
 URL: https://issues.apache.org/jira/browse/HDDS-1961
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Nanda kumar
Assignee: Nanda kumar


TestStorageContainerManager#testScmProcessDatanodeHeartbeat is flaky
{noformat}
[ERROR] 
testScmProcessDatanodeHeartbeat(org.apache.hadoop.ozone.TestStorageContainerManager)
  Time elapsed: 25.057 s  <<< FAILURE!
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.ozone.TestStorageContainerManager.testScmProcessDatanodeHeartbeat(TestStorageContainerManager.java:531)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)

{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1967) TestBlockOutputStreamWithFailures is flaky

2019-08-14 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1967:
-

 Summary: TestBlockOutputStreamWithFailures is flaky
 Key: HDDS-1967
 URL: https://issues.apache.org/jira/browse/HDDS-1967
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Nanda kumar


{{TestBlockOutputStreamWithFailures}} is flaky. 
{noformat}
[ERROR] 
test2DatanodesFailure(org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures)
  Time elapsed: 23.816 s  <<< FAILURE!
java.lang.AssertionError: expected:<4> but was:<8>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures.test2DatanodesFailure(TestBlockOutputStreamWithFailures.java:425)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{noformat}

{noformat}
[ERROR] 
testWatchForCommitDatanodeFailure(org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures)
  Time elapsed: 30.895 s  <<< FAILURE!
java.lang.AssertionError: expected:<2> but was:<3>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures.testWatchForCommitDatanodeFailure(TestBlockOutputStreamWithFailures.java:366)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunne

[jira] [Created] (HDDS-1977) Fix checkstyle issues introduced by HDDS-1894

2019-08-17 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1977:
-

 Summary: Fix checkstyle issues introduced by HDDS-1894
 Key: HDDS-1977
 URL: https://issues.apache.org/jira/browse/HDDS-1977
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: SCM Client
Reporter: Nanda kumar


Fix the checkstyle issues introduced by HDDS-1894
{noformat}

[INFO] There are 6 errors reported by Checkstyle 8.8 with 
checkstyle/checkstyle.xml ruleset.
[ERROR] 
src/main/java/org/apache/hadoop/hdds/scm/cli/pipeline/ListPipelinesSubcommand.java:[41,23]
 (whitespace) ParenPad: '(' is followed by whitespace.
[ERROR] 
src/main/java/org/apache/hadoop/hdds/scm/cli/pipeline/ListPipelinesSubcommand.java:[42]
 (sizes) LineLength: Line is longer than 80 characters (found 88).
[ERROR] 
src/main/java/org/apache/hadoop/hdds/scm/cli/pipeline/ListPipelinesSubcommand.java:[46,23]
 (whitespace) ParenPad: '(' is followed by whitespace.
[ERROR] 
src/main/java/org/apache/hadoop/hdds/scm/cli/pipeline/ListPipelinesSubcommand.java:[47]
 (sizes) LineLength: Line is longer than 80 characters (found 90).
[ERROR] 
src/main/java/org/apache/hadoop/hdds/scm/cli/pipeline/ListPipelinesSubcommand.java:[59]
 (sizes) LineLength: Line is longer than 80 characters (found 116).
[ERROR] 
src/main/java/org/apache/hadoop/hdds/scm/cli/pipeline/ListPipelinesSubcommand.java:[60]
 (sizes) LineLength: Line is longer than 80 characters (found 120).
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1978) Create helper script to run blockade tests

2019-08-17 Thread Nanda kumar (JIRA)
Nanda kumar created HDDS-1978:
-

 Summary: Create helper script to run blockade tests
 Key: HDDS-1978
 URL: https://issues.apache.org/jira/browse/HDDS-1978
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: test
Reporter: Nanda kumar
Assignee: Nanda kumar


To run blockade tests as part of jenkins job we need some kind of helper script.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1994) Compilation failure due to missing class ScmBlockLocationTestingClient

2019-08-20 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-1994.
---
Resolution: Duplicate

> Compilation failure due to missing class ScmBlockLocationTestingClient
> --
>
> Key: HDDS-1994
> URL: https://issues.apache.org/jira/browse/HDDS-1994
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>Reporter: Hrishikesh Gadre
>Assignee: Hrishikesh Gadre
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ozone build is failing due to following compilation error,
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile 
> (default-testCompile) on project hadoop-ozone-ozone-manager: Compilation 
> failure: Compilation failure:
> [ERROR] 
> /Users/hgadre/git-repo/upstream/hadoop/hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestKeyDeletingService.java:[94,17]
>  cannot find symbol
> [ERROR]   symbol:   class ScmBlockLocationTestingClient
> [ERROR]   location: class org.apache.hadoop.ozone.om.TestKeyDeletingService
> [ERROR] 
> /Users/hgadre/git-repo/upstream/hadoop/hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestKeyDeletingService.java:[116,17]
>  cannot find symbol
> [ERROR]   symbol:   class ScmBlockLocationTestingClient
> [ERROR]   location: class org.apache.hadoop.ozone.om.TestKeyDeletingService
> [ERROR] 
> /Users/hgadre/git-repo/upstream/hadoop/hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestKeyDeletingService.java:[143,17]
>  cannot find symbol
> [ERROR]   symbol:   class ScmBlockLocationTestingClient
> [ERROR]   location: class org.apache.hadoop.ozone.om.TestKeyDeletingService
> [ERROR] -> [Help 1]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1998) TestSecureContainerServer#testClientServerRatisGrpc is failing

2019-08-21 Thread Nanda kumar (Jira)
Nanda kumar created HDDS-1998:
-

 Summary: TestSecureContainerServer#testClientServerRatisGrpc is 
failing
 Key: HDDS-1998
 URL: https://issues.apache.org/jira/browse/HDDS-1998
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: test
Reporter: Nanda kumar


{{TestSecureContainerServer#testClientServerRatisGrpc}} is failing on trunk 
with the following error.

{noformat}
[ERROR] 
testClientServerRatisGrpc(org.apache.hadoop.ozone.container.server.TestSecureContainerServer)
  Time elapsed: 7.544 s  <<< ERROR!
java.io.IOException:
Failed to command cmdType: CreateContainer
containerID: 1566379872577
datanodeUuid: "87ebf146-2a8f-4060-8f06-615ed61a9fe0"
createContainer {
}

at 
org.apache.hadoop.hdds.scm.XceiverClientSpi.sendCommand(XceiverClientSpi.java:113)
at 
org.apache.hadoop.ozone.container.server.TestSecureContainerServer.runTestClientServer(TestSecureContainerServer.java:206)
at 
org.apache.hadoop.ozone.container.server.TestSecureContainerServer.runTestClientServerRatis(TestSecureContainerServer.java:157)
at 
org.apache.hadoop.ozone.container.server.TestSecureContainerServer.testClientServerRatisGrpc(TestSecureContainerServer.java:132)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
Caused by: java.util.concurrent.ExecutionException: 
org.apache.ratis.protocol.StateMachineException: 
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
Block token verification failed. Fail to find any token (empty or null.)
at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at 
org.apache.hadoop.hdds.scm.XceiverClientSpi.sendCommand(XceiverClientSpi.java:110)
... 29 more
Caused by: org.apache.ratis.protocol.StateMachineException: 
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
Block token verification failed. Fail to find any token (empty or null.)
at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$7(ContainerStateMachine.java:701)
at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1595)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.conc

[jira] [Resolved] (HDDS-1922) Next button on the bottom of "static/docs/index.html" landing page does not work

2019-08-21 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-1922.
---
Resolution: Cannot Reproduce

> Next button on the bottom of "static/docs/index.html" landing page does not 
> work
> 
>
> Key: HDDS-1922
> URL: https://issues.apache.org/jira/browse/HDDS-1922
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Manager
>Affects Versions: 0.4.0
>Reporter: Mukul Kumar Singh
>Priority: Major
>
> On Ozone landing doc page, the next link doesn't work .



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2001) Update Ratis version to 0.4.0

2019-08-21 Thread Nanda kumar (Jira)
Nanda kumar created HDDS-2001:
-

 Summary: Update Ratis version to 0.4.0
 Key: HDDS-2001
 URL: https://issues.apache.org/jira/browse/HDDS-2001
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
Reporter: Nanda kumar
Assignee: Nanda kumar


Update Ratis version to 0.4.0



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2002) Update documentation for 0.4.1 release

2019-08-21 Thread Nanda kumar (Jira)
Nanda kumar created HDDS-2002:
-

 Summary: Update documentation for 0.4.1 release
 Key: HDDS-2002
 URL: https://issues.apache.org/jira/browse/HDDS-2002
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: documentation
Reporter: Nanda kumar
Assignee: Nanda kumar


We have to update Ozone documentation to reflect the latest changes made.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1303) Support native ACL for Ozone

2019-08-21 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-1303.
---
Fix Version/s: 0.4.1
   Resolution: Fixed

> Support native ACL for Ozone
> 
>
> Key: HDDS-1303
> URL: https://issues.apache.org/jira/browse/HDDS-1303
> Project: Hadoop Distributed Data Store
>  Issue Type: New Feature
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Blocker
> Fix For: 0.4.1
>
>
> add native acl support for OM operations



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-895) Remove command watcher from ReplicationManager

2019-08-22 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-895.
--
Resolution: Implemented

Implemented as part of HDDS-1205.

> Remove command watcher from ReplicationManager
> --
>
> Key: HDDS-895
> URL: https://issues.apache.org/jira/browse/HDDS-895
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: SCM
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDDS-895.000.patch
>
>
> We can remove the command watcher from {{ReplicationManager}} and user 
> internal timeout to retrigger the replication command.
> Instead of waiting for every command that has been sent out to datanode, we 
> can use an internal timer to check if the container replica state has reached 
> the expected container state.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-1304) Ozone ha breaks service discovery

2019-08-22 Thread Nanda kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar resolved HDDS-1304.
---
Resolution: Not A Problem

> Ozone ha breaks service discovery
> -
>
> Key: HDDS-1304
> URL: https://issues.apache.org/jira/browse/HDDS-1304
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Ajay Kumar
>Assignee: Nanda kumar
>Priority: Critical
>
> We need to redefine the semantics of what service discovery means with HA 
> enabled.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2028) Release Ozone 0.4.1

2019-08-24 Thread Nanda kumar (Jira)
Nanda kumar created HDDS-2028:
-

 Summary: Release Ozone 0.4.1
 Key: HDDS-2028
 URL: https://issues.apache.org/jira/browse/HDDS-2028
 Project: Hadoop Distributed Data Store
  Issue Type: Test
Reporter: Nanda kumar
Assignee: Nanda kumar


This jira is to track Ozone 0.4.1 release



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2029) Fix license issues on ozone-0.4.1

2019-08-24 Thread Nanda kumar (Jira)
Nanda kumar created HDDS-2029:
-

 Summary: Fix license issues on ozone-0.4.1
 Key: HDDS-2029
 URL: https://issues.apache.org/jira/browse/HDDS-2029
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Nanda kumar
Assignee: Nanda kumar


There are files on ozone-0.4.1 branch which doesn't have apache license header, 
they have to be fixed.
{noformat}
hadoop/hadoop-ozone/dist/src/main/compose/ozones3-haproxy/haproxy-conf/haproxy.cfg
hadoop/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestOzoneRpcClientForAclAuditLog.java
 
hadoop/hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/response/s3/bucket/TestS3BucketDeleteResponse.java
hadoop/hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/response/s3/multipart/TestS3MultipartUploadAbortResponse.java
hadoop/hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/request/s3/multipart/TestS3MultipartUploadAbortRequest.java
hadoop/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/response/key/OMKeyPurgeResponse.java
hadoop/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeyPurgeRequest.java
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-2037) Fix hadoop version in pom.ozone.xml

2019-08-26 Thread Nanda kumar (Jira)
Nanda kumar created HDDS-2037:
-

 Summary: Fix hadoop version in pom.ozone.xml
 Key: HDDS-2037
 URL: https://issues.apache.org/jira/browse/HDDS-2037
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Nanda kumar
Assignee: Nanda kumar


The hadoop version in pom.ozone.xml is pointing to SNAPSHOT version, this has 
to be fixed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



  1   2   3   >