[jira] [Created] (HDDS-755) ContainerInfo and ContainerReplica protobuf changes
Nanda kumar created HDDS-755: Summary: ContainerInfo and ContainerReplica protobuf changes Key: HDDS-755 URL: https://issues.apache.org/jira/browse/HDDS-755 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Ozone Datanode, SCM Reporter: Nanda kumar Assignee: Nanda kumar We have different classes that maintain container related information, we can consolidate them so that it is easy to read the code. Proposal: In SCM: will be used in communication between SCM and Client, also used for storing in db * ContainerInfoProto * ContainerInfo In Datanode: Used in communication between Datanode and SCM * ContainerReplicaProto * ContainerReplica In Datanode: Used in communication between Datanode and Client * ContainerDataProto * ContainerData -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-801) Quasi close the container when close is not executed via Ratis
Nanda kumar created HDDS-801: Summary: Quasi close the container when close is not executed via Ratis Key: HDDS-801 URL: https://issues.apache.org/jira/browse/HDDS-801 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Ozone Datanode Affects Versions: 0.3.0 Reporter: Nanda kumar Assignee: Nanda kumar When datanode received CloseContainerCommand and the replication type is not RATIS, we should QUASI close the container. After quasi-closing the container an ICR has to be sent to SCM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-812) TestEndPoint#testCheckVersionResponse is failing
Nanda kumar created HDDS-812: Summary: TestEndPoint#testCheckVersionResponse is failing Key: HDDS-812 URL: https://issues.apache.org/jira/browse/HDDS-812 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Nanda kumar TestEndPoint#testCheckVersionResponse is failing with the below error {code:java} [ERROR] testCheckVersionResponse(org.apache.hadoop.ozone.container.common.TestEndPoint) Time elapsed: 0.142 s <<< FAILURE! java.lang.AssertionError: expected: but was: {code} Once we are in REGISTER state we don't allow getVersion call anymore. This is causing the test case to fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-823) OzoneRestClient is failing with NPE on getKeyDetails call
Nanda kumar created HDDS-823: Summary: OzoneRestClient is failing with NPE on getKeyDetails call Key: HDDS-823 URL: https://issues.apache.org/jira/browse/HDDS-823 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Client Affects Versions: 0.3.0 Reporter: Nanda kumar {{RestClient#getKeyDetails}} is failing with {{NullPointerException}} which is causing a lot of unit test and smoke test to fail. Exception trace: {code:java} Error while calling command (org.apache.hadoop.ozone.web.ozShell.keys.InfoKeyHandler@13713486): java.lang.NullPointerException at picocli.CommandLine.execute(CommandLine.java:926) at picocli.CommandLine.access$700(CommandLine.java:104) at picocli.CommandLine$RunLast.handle(CommandLine.java:1083) at picocli.CommandLine$RunLast.handle(CommandLine.java:1051) at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:959) at picocli.CommandLine.parseWithHandlers(CommandLine.java:1242) at org.apache.hadoop.ozone.ozShell.TestOzoneShell.execute(TestOzoneShell.java:259) at org.apache.hadoop.ozone.ozShell.TestOzoneShell.testInfoDirKey(TestOzoneShell.java:1013) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) Caused by: java.lang.NullPointerException at org.apache.hadoop.ozone.client.rest.RestClient.getKeyDetails(RestClient.java:817) at org.apache.hadoop.ozone.client.OzoneBucket.getKey(OzoneBucket.java:282) at org.apache.hadoop.ozone.web.ozShell.keys.InfoKeyHandler.call(InfoKeyHandler.java:65) at org.apache.hadoop.ozone.web.ozShell.keys.InfoKeyHandler.call(InfoKeyHandler.java:37) at picocli.CommandLine.execute(CommandLine.java:919) ... 18 more {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-827) TestStorageContainerManagerHttpServer should use dynamic port
Nanda kumar created HDDS-827: Summary: TestStorageContainerManagerHttpServer should use dynamic port Key: HDDS-827 URL: https://issues.apache.org/jira/browse/HDDS-827 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: test Reporter: Nanda kumar Most of the time {{TestStorageContainerManagerHttpServer}} is failing with {code} java.net.BindException: Port in use: 0.0.0.0:9876 ... Caused by: java.net.BindException: Address already in use {code} TestStorageContainerManagerHttpServer should use a port which is free (dynamic), instead of trying to bind with default 9876. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-830) Datanode should not start XceiverServerRatis before getting version information from SCM
Nanda kumar created HDDS-830: Summary: Datanode should not start XceiverServerRatis before getting version information from SCM Key: HDDS-830 URL: https://issues.apache.org/jira/browse/HDDS-830 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Ozone Datanode Affects Versions: 0.3.0 Reporter: Nanda kumar If a datanode restarts quickly before SCM detects, it will rejoin the ratis ring (existing pipeline). Since SCM didn't detect this restart, the pipeline is not closed. Now there is a time gap after the datanode is started and it got the version information from SCM. During this time, the SCM ID in datanode is not set(null). If a client tries to use this pipeline during that time, the container state machine will throw {{java.lang.NullPointerException: scmId cannot be nul}}. This will cause {{RaftLogWorker}} to terminate resulting in datanode crash. {code} 2018-11-12 19:45:31,811 ERROR storage.RaftLogWorker (ExitUtils.java:terminate(86)) - Terminating with exit status 1: 407fd181-2ff7-4651-9a47-a0927ede4c51-RaftLogWorker failed. java.io.IOException: java.lang.NullPointerException: scmId cannot be null at org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54) at org.apache.ratis.util.IOUtils.toIOException(IOUtils.java:61) at org.apache.ratis.util.IOUtils.getFromFuture(IOUtils.java:83) at org.apache.ratis.server.storage.RaftLogWorker$StateMachineDataPolicy.getFromFuture(RaftLogWorker.java:76) at org.apache.ratis.server.storage.RaftLogWorker$WriteLog.execute(RaftLogWorker.java:344) at org.apache.ratis.server.storage.RaftLogWorker.run(RaftLogWorker.java:216) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException: scmId cannot be null at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204) at org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.create(KeyValueContainer.java:106) at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handleCreateContainer(KeyValueHandler.java:242) at org.apache.hadoop.ozone.container.keyvalue.KeyValueHandler.handle(KeyValueHandler.java:165) at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.createContainer(HddsDispatcher.java:206) at org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:124) at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.dispatchCommand(ContainerStateMachine.java:274) at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.runCommand(ContainerStateMachine.java:280) at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$handleWriteChunk$1(ContainerStateMachine.java:301) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-831) TestOzoneShell in integration-test is flaky
Nanda kumar created HDDS-831: Summary: TestOzoneShell in integration-test is flaky Key: HDDS-831 URL: https://issues.apache.org/jira/browse/HDDS-831 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Nanda kumar Assignee: Nanda kumar TestOzoneShell in integration-test is flaky, fails in few Jenkins runs. https://builds.apache.org/job/PreCommit-HDDS-Build/1685/artifact/out/patch-unit-hadoop-ozone_integration-test.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-833) Update javadoc in StorageContainerManager, NodeManager, PipelineManager and ContainerManager
Nanda kumar created HDDS-833: Summary: Update javadoc in StorageContainerManager, NodeManager, PipelineManager and ContainerManager Key: HDDS-833 URL: https://issues.apache.org/jira/browse/HDDS-833 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: SCM Reporter: Nanda kumar Assignee: Nanda kumar The javadoc in following interface/classes has to be updated * StorageContainerManager * NodeManager * NodeStateManager * PipelineManager * PipelineStateManager * ContainerManager * ContainerStateManager -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-837) Persist originNodeId as part of .container file in datanode
Nanda kumar created HDDS-837: Summary: Persist originNodeId as part of .container file in datanode Key: HDDS-837 URL: https://issues.apache.org/jira/browse/HDDS-837 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Ozone Datanode Reporter: Nanda kumar Assignee: Nanda kumar To differentiate the replica of QUASI_CLOSED containers we need {{originNodeId}} field. With this field, we can uniquely identify a QUASI_CLOSED container replica. This will be needed when we want to CLOSE a QUASI_CLOSED container. This field will be set by the node where the container is created and stored as part of {{.container}} file and will be sent as part of ContainerReport to SCM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-847) TestBlockDeletion is failing
Nanda kumar created HDDS-847: Summary: TestBlockDeletion is failing Key: HDDS-847 URL: https://issues.apache.org/jira/browse/HDDS-847 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Nanda kumar {{TestBlockDeletion}} is failing with the below exception {code} [ERROR] testBlockDeletion(org.apache.hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion) Time elapsed: 28.017 s <<< FAILURE! java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.ozone.container.common.statemachine.commandhandler.TestBlockDeletion.testBlockDeletion(TestBlockDeletion.java:165) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-853) Option to force close a container in Datanode
Nanda kumar created HDDS-853: Summary: Option to force close a container in Datanode Key: HDDS-853 URL: https://issues.apache.org/jira/browse/HDDS-853 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Ozone Datanode Reporter: Nanda kumar Assignee: Nanda kumar We need an option to force close a container in Datanode. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-854) TestFailureHandlingByClient.testMultiBlockWritesWithDnFailures is flaky
Nanda kumar created HDDS-854: Summary: TestFailureHandlingByClient.testMultiBlockWritesWithDnFailures is flaky Key: HDDS-854 URL: https://issues.apache.org/jira/browse/HDDS-854 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Nanda kumar Assignee: Nanda kumar TestFailureHandlingByClient.testMultiBlockWritesWithDnFailures is flaky. It times out while waiting for the mini cluster datanode to restart {code} at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:389) at org.apache.hadoop.ozone.MiniOzoneClusterImpl.waitForClusterToBeReady(MiniOzoneClusterImpl.java:122) at org.apache.hadoop.ozone.MiniOzoneClusterImpl.restartHddsDatanode(MiniOzoneClusterImpl.java:276) at org.apache.hadoop.ozone.MiniOzoneClusterImpl.restartHddsDatanode(MiniOzoneClusterImpl.java:283) at org.apache.hadoop.ozone.client.rpc.TestFailureHandlingByClient.testMultiBlockWritesWithDnFailures(TestFailureHandlingByClient.java:200) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-863) TestNodeManager is failing
Nanda kumar created HDDS-863: Summary: TestNodeManager is failing Key: HDDS-863 URL: https://issues.apache.org/jira/browse/HDDS-863 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Nanda kumar All the tests in TestNodeManager is failing with the below {code} [ERROR] testScmDetectStaleAndDeadNode(org.apache.hadoop.hdds.scm.node.TestNodeManager) Time elapsed: 0.671 s <<< ERROR! java.lang.NullPointerException at org.apache.hadoop.hdds.scm.node.SCMNodeManager.updateNodeStat(SCMNodeManager.java:195) at org.apache.hadoop.hdds.scm.node.SCMNodeManager.register(SCMNodeManager.java:276) at org.apache.hadoop.hdds.scm.TestUtils.createRandomDatanodeAndRegister(TestUtils.java:147) at org.apache.hadoop.hdds.scm.node.TestNodeManager.createNodeSet(TestNodeManager.java:590) at org.apache.hadoop.hdds.scm.node.TestNodeManager.testScmDetectStaleAndDeadNode(TestNodeManager.java:316) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:168) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-863) TestNodeManager is failing
[ https://issues.apache.org/jira/browse/HDDS-863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-863. -- Resolution: Duplicate > TestNodeManager is failing > -- > > Key: HDDS-863 > URL: https://issues.apache.org/jira/browse/HDDS-863 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Nanda kumar >Priority: Major > > All the tests in TestNodeManager is failing with the below > {code} > [ERROR] > testScmDetectStaleAndDeadNode(org.apache.hadoop.hdds.scm.node.TestNodeManager) > Time elapsed: 0.671 s <<< ERROR! > java.lang.NullPointerException > at > org.apache.hadoop.hdds.scm.node.SCMNodeManager.updateNodeStat(SCMNodeManager.java:195) > at > org.apache.hadoop.hdds.scm.node.SCMNodeManager.register(SCMNodeManager.java:276) > at > org.apache.hadoop.hdds.scm.TestUtils.createRandomDatanodeAndRegister(TestUtils.java:147) > at > org.apache.hadoop.hdds.scm.node.TestNodeManager.createNodeSet(TestNodeManager.java:590) > at > org.apache.hadoop.hdds.scm.node.TestNodeManager.testScmDetectStaleAndDeadNode(TestNodeManager.java:316) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:168) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:125) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:413) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-868) Handle quasi closed container replicas in SCM
Nanda kumar created HDDS-868: Summary: Handle quasi closed container replicas in SCM Key: HDDS-868 URL: https://issues.apache.org/jira/browse/HDDS-868 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: SCM Reporter: Nanda kumar Assignee: Nanda kumar In case of pipeline failure the containers will be quais closed by datanode. SCM has to understand that the container replica is quasi closed and based on the block commit sequence Id SCM should identify the latest replica and force close them now. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-896) Handle over replicated containers in SCM
Nanda kumar created HDDS-896: Summary: Handle over replicated containers in SCM Key: HDDS-896 URL: https://issues.apache.org/jira/browse/HDDS-896 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: SCM Reporter: Nanda kumar Assignee: Nanda kumar When SCM detects that a container is over-replicated, it has to delete some replicas to bring the number of replicas to match the required value. If the container is in QUASI_CLOSED state, we should check the {{originNodeId}} field while choosing the replica to delete. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-895) Remove command watcher from ReplicationManager
Nanda kumar created HDDS-895: Summary: Remove command watcher from ReplicationManager Key: HDDS-895 URL: https://issues.apache.org/jira/browse/HDDS-895 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: SCM Reporter: Nanda kumar Assignee: Nanda kumar We can remove the command watcher from {{ReplicationManager}} and user internal timeout to retrigger the replication command. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-961) Send command execution metrics from Datanode to SCM
Nanda kumar created HDDS-961: Summary: Send command execution metrics from Datanode to SCM Key: HDDS-961 URL: https://issues.apache.org/jira/browse/HDDS-961 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Ozone Datanode, SCM Reporter: Nanda kumar The CommandHandlers in datanode calculates and tracks the time taken to execute each command that is sent by SCM. It would be nice to report these values to SCM so that we can build average time, std dev etc for those operations. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-962) Introduce locking for container operations that are executed via DatanodeCommand
Nanda kumar created HDDS-962: Summary: Introduce locking for container operations that are executed via DatanodeCommand Key: HDDS-962 URL: https://issues.apache.org/jira/browse/HDDS-962 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Ozone Datanode Reporter: Nanda kumar Assignee: Nanda kumar When SCM decides to take some action on a container, it sends DatanodeCommand to the datanodes. These commands are handled by CommandHandlers in datanode. Without proper locking, we cannot process these commands in parallel. This jira aims to introduce locks on container operations which are performed via ContainerController. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1048) Remove SCMNodeStat from SCMNodeManager and use storage information from DatanodeInfo#StorageReportProto
Nanda kumar created HDDS-1048: - Summary: Remove SCMNodeStat from SCMNodeManager and use storage information from DatanodeInfo#StorageReportProto Key: HDDS-1048 URL: https://issues.apache.org/jira/browse/HDDS-1048 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: SCM Affects Versions: 0.3.0 Reporter: Nanda kumar Assignee: Nanda kumar We don't have to maintain SCMNodeStat in SCMNodeManager anymore. This information can be obtained from {{DatanodeInfo#StorageReportProto}} inside NodeStateMap. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1049) TestRatisPipelineProvider#testCreatePipelineWithFactor is failing
Nanda kumar created HDDS-1049: - Summary: TestRatisPipelineProvider#testCreatePipelineWithFactor is failing Key: HDDS-1049 URL: https://issues.apache.org/jira/browse/HDDS-1049 Project: Hadoop Distributed Data Store Issue Type: Test Components: test Affects Versions: 0.3.0 Reporter: Nanda kumar {{TestRatisPipelineProvider#testCreatePipelineWithFactor}} is failing with the below exception {code} [ERROR] testCreatePipelineWithFactor(org.apache.hadoop.hdds.scm.pipeline.TestRatisPipelineProvider) Time elapsed: 0.927 s <<< FAILURE! java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.hdds.scm.pipeline.TestRatisPipelineProvider.testCreatePipelineWithFactor(TestRatisPipelineProvider.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1050) TestSCMRestart#testPipelineWithScmRestart is failing
Nanda kumar created HDDS-1050: - Summary: TestSCMRestart#testPipelineWithScmRestart is failing Key: HDDS-1050 URL: https://issues.apache.org/jira/browse/HDDS-1050 Project: Hadoop Distributed Data Store Issue Type: Test Components: test Affects Versions: 0.3.0 Reporter: Nanda kumar {{TestSCMRestart#testPipelineWithScmRestart}} is failing with the below exception {code} [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 11.896 s <<< FAILURE! - in org.apache.hadoop.hdds.scm.pipeline.TestSCMRestart [ERROR] testPipelineWithScmRestart(org.apache.hadoop.hdds.scm.pipeline.TestSCMRestart) Time elapsed: 0.047 s <<< FAILURE! java.lang.AssertionError: expected: but was: at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.hdds.scm.pipeline.TestSCMRestart.testPipelineWithScmRestart(TestSCMRestart.java:110) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1051) TestCloseContainerByPipeline#testIfCloseContainerCommandHandlerIsInvoked is failing
Nanda kumar created HDDS-1051: - Summary: TestCloseContainerByPipeline#testIfCloseContainerCommandHandlerIsInvoked is failing Key: HDDS-1051 URL: https://issues.apache.org/jira/browse/HDDS-1051 Project: Hadoop Distributed Data Store Issue Type: Test Components: test Affects Versions: 0.3.0 Reporter: Nanda kumar {{TestCloseContainerByPipeline#testIfCloseContainerCommandHandlerIsInvoked}} is failing with the following exception {code:java} [ERROR] testIfCloseContainerCommandHandlerIsInvoked(org.apache.hadoop.ozone.container.common.statemachine.commandhandler.TestCloseContainerByPipeline) Time elapsed: 21.943 s <<< ERROR! java.lang.StackOverflowError at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject$ClassSet.populateSet(Subject.java:1399) at javax.security.auth.Subject$ClassSet.(Subject.java:1372) at javax.security.auth.Subject.getPrivateCredentials(Subject.java:767) at org.apache.hadoop.security.UserGroupInformation.getCredentialsInternal(UserGroupInformation.java:1559) at org.apache.hadoop.security.UserGroupInformation.getTokens(UserGroupInformation.java:1524) at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.getEncodedBlockToken(ContainerProtocolCalls.java:580) at org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.writeChunkAsync(ContainerProtocolCalls.java:318) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.writeChunkToContainer(BlockOutputStream.java:602) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.writeChunk(BlockOutputStream.java:464) at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:480) at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:137) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:489) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:501) .. .. {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1052) TestOzoneRpcClient is flaky
Nanda kumar created HDDS-1052: - Summary: TestOzoneRpcClient is flaky Key: HDDS-1052 URL: https://issues.apache.org/jira/browse/HDDS-1052 Project: Hadoop Distributed Data Store Issue Type: Test Components: test Affects Versions: 0.3.0 Reporter: Nanda kumar {{TestOzoneRpcClient}} is flaky. The following test cases fail randomly. {code} [ERROR] Errors: [ERROR] TestOzoneRpcClient>TestOzoneRpcClientAbstract.testListPartsWithPartMarkerGreaterThanPartCount:1932->TestOzoneRpcClientAbstract.uploadPart:2048 » IO [ERROR] TestOzoneRpcClient>TestOzoneRpcClientAbstract.testMultipartUploadWithPartsMisMatchWithIncorrectPartName:1657->TestOzoneRpcClientAbstract.uploadPart:2048 » IO [ERROR] TestOzoneRpcClient>TestOzoneRpcClientAbstract.testPutKey:558 » IO Unexpected S... [ERROR] TestOzoneRpcClient>TestOzoneRpcClientAbstract.testReadKeyWithCorruptedData:884 » IO [ERROR] TestOzoneRpcClient>TestOzoneRpcClientAbstract.testUploadPartWithNoOverride:1391 » IO {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1070) Adding Node and Pipeline related metrics in SCM
Nanda kumar created HDDS-1070: - Summary: Adding Node and Pipeline related metrics in SCM Key: HDDS-1070 URL: https://issues.apache.org/jira/browse/HDDS-1070 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: SCM Affects Versions: 0.3.0 Reporter: Nanda kumar Assignee: Nanda kumar This jira aims to add more Node and Pipeline related metrics to SCM. Following metrics will be added as part of this jira: * numberOfSuccessfulPipelineCreation * numberOfFailedPipelineCreation * numberOfSuccessfulPipelineDestroy * numberOfFailedPipelineDestroy * numberOfPipelineReportProcessed * numberOfNodeReportProcessed * numberOfHBProcessed * number of pipelines in different PipelineState -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1146) Adding container related metrics in SCM
Nanda kumar created HDDS-1146: - Summary: Adding container related metrics in SCM Key: HDDS-1146 URL: https://issues.apache.org/jira/browse/HDDS-1146 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: SCM Reporter: Nanda kumar This jira aims to add more container related metrics to SCM. Following metrics will be added as part of this jira: * Number of containers * Number of open containers * Number of closed containers * Number of quasi closed containers * Number of closing containers * Number of successful create container calls * Number of failed create container calls * Number of successful delete container calls * Number of failed delete container calls * Number of successful container report processing * Number of failed container report processing * Number of successful incremental container report processing * Number of failed incremental container report processing -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1166) Fix checkstyle line length issues
Nanda kumar created HDDS-1166: - Summary: Fix checkstyle line length issues Key: HDDS-1166 URL: https://issues.apache.org/jira/browse/HDDS-1166 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: SCM Reporter: Nanda kumar Assignee: Nanda kumar Checkstyle line length issues have to be fixed in the following classes * BlockManagerImpl * CloseContainerCommandHandler * TestCloseContainerCommandHandler -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1167) Error in hadoop-ozone/dev-support/checks/checkstyle.sh
Nanda kumar created HDDS-1167: - Summary: Error in hadoop-ozone/dev-support/checks/checkstyle.sh Key: HDDS-1167 URL: https://issues.apache.org/jira/browse/HDDS-1167 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Nanda kumar Assignee: Nanda kumar While running {{hadoop-ozone/dev-support/checks/checkstyle.sh}} the following error is thrown. {code} grep: warning: recursive search of stdin {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1168) Use random ports in TestBlockManager and TestDeletedBlockLog
Nanda kumar created HDDS-1168: - Summary: Use random ports in TestBlockManager and TestDeletedBlockLog Key: HDDS-1168 URL: https://issues.apache.org/jira/browse/HDDS-1168 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: SCM Reporter: Nanda kumar Assignee: Nanda kumar TestBlockManager and TestDeletedBlockLog uses default ports which are causing BindException when the tests are executed in parallel. We should start using random ports to avoid this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-69) Add checkBucketAccess to OzoneManger
[ https://issues.apache.org/jira/browse/HDDS-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-69. - Resolution: Won't Fix > Add checkBucketAccess to OzoneManger > > > Key: HDDS-69 > URL: https://issues.apache.org/jira/browse/HDDS-69 > Project: Hadoop Distributed Data Store > Issue Type: New Feature > Components: Ozone Manager >Reporter: Nanda kumar >Assignee: Nanda kumar >Priority: Major > Attachments: HDFS-12147-HDFS-7240.000.patch, > HDFS-12147-HDFS-7240.001.patch > > > Checks if the caller has access to a given bucket. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1205) Introduce Replication Manager Thread inside Container Manager
Nanda kumar created HDDS-1205: - Summary: Introduce Replication Manager Thread inside Container Manager Key: HDDS-1205 URL: https://issues.apache.org/jira/browse/HDDS-1205 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: SCM Reporter: Nanda kumar Assignee: Nanda kumar This jira introduces a replication manager thread inside the {{ContainerManager}} which will use RMT (Replication Manager Thread) Decision Engine to decide the action to be taken on flagged containers. The containers are flagged for ReplicationManagerThread by ContainerReportProcessor(s) and Stale/Dead Node event handlers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1207) Bootstrap flagged container set before starting replication manager thread
Nanda kumar created HDDS-1207: - Summary: Bootstrap flagged container set before starting replication manager thread Key: HDDS-1207 URL: https://issues.apache.org/jira/browse/HDDS-1207 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: SCM Reporter: Nanda kumar Assignee: Nanda kumar When SCM starts, before starting ReplicationManager thread we have to inspect all the containers and flag unhealthy ones for RMT to process. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1239) Use chillmode state from ChillModeManager in ChillModePrecheck
Nanda kumar created HDDS-1239: - Summary: Use chillmode state from ChillModeManager in ChillModePrecheck Key: HDDS-1239 URL: https://issues.apache.org/jira/browse/HDDS-1239 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: SCM Affects Versions: 0.3.0 Reporter: Nanda kumar Assignee: Nanda kumar Currently {{ChillModePrecheck}} is instantiated in multiple places and it maintains its own state of chillmode. Because of this {{BlockManagerImpl}} and {{SCMClientProtocolServer}} listens to chillmode status event to update {{ChillModePrecheck}} that they maintain. It will be easier if {{ChillModePrecheck}} queries {{SCMChillModeManager}} to get the current chillmode state. It will also make the code simple if {{SCMChillModeManager}} provides {{ChillModePrecheck}} instance, instead of everyone creating a new object of {{ChillModePrecheck}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1332) Add some logging for flaky test testStartStopDatanodeStateMachine
[ https://issues.apache.org/jira/browse/HDDS-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-1332. --- Resolution: Fixed Fix Version/s: 0.5.0 Target Version/s: 0.5.0 [~arpitagarwal], thanks for the contribution. Committed this to trunk. > Add some logging for flaky test testStartStopDatanodeStateMachine > - > > Key: HDDS-1332 > URL: https://issues.apache.org/jira/browse/HDDS-1332 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > testStartStopDatanodeStateMachine fails frequently in Jenkins. It also seems > to have a timing issue which may be different from the Jenkins failure. > E.g. If I add a 10 second sleep as below I can get the test to fail 100%. > {code} > @@ -163,6 +163,7 @@ public void testStartStopDatanodeStateMachine() throws > IOException, > try (DatanodeStateMachine stateMachine = > new DatanodeStateMachine(getNewDatanodeDetails(), conf, null)) { >stateMachine.startDaemon(); > + Thread.sleep(10_000L); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1353) Metrics scm_pipeline_metrics_num_pipeline_creation_failed keeps increasin
Nanda kumar created HDDS-1353: - Summary: Metrics scm_pipeline_metrics_num_pipeline_creation_failed keeps increasin Key: HDDS-1353 URL: https://issues.apache.org/jira/browse/HDDS-1353 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: SCM Reporter: Nanda kumar There is a {{BackgroundPipelineCreator}} thread in SCM which runs in a fixed interval and tries to create pipelines. This BackgroundPipelineCreator uses {{IOException}} as exit criteria (no more pipelines can be created). In each run of BackgroundPipelineCreator we exit when we are not able to create any more pipelines, i.e. when we get IOException while trying to create the pipeline. This means that {{scm_pipeline_metrics_num_pipeline_creation_failed}} value will get incremented in each run of BackgroundPipelineCreator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1368) Cleanup old ReplicationManager code from SCM
Nanda kumar created HDDS-1368: - Summary: Cleanup old ReplicationManager code from SCM Key: HDDS-1368 URL: https://issues.apache.org/jira/browse/HDDS-1368 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: SCM Reporter: Nanda kumar Assignee: Nanda kumar HDDS-1205 brings in new ReplicationManager and HDDS-1207 plugs in the new code, this jira is for removing the old ReplicationManager and related code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1207) Refactor Container Report Processing logic and plugin new Replication Manager
[ https://issues.apache.org/jira/browse/HDDS-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-1207. --- Resolution: Fixed Fix Version/s: 0.5.0 > Refactor Container Report Processing logic and plugin new Replication Manager > - > > Key: HDDS-1207 > URL: https://issues.apache.org/jira/browse/HDDS-1207 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: SCM >Reporter: Nanda kumar >Assignee: Nanda kumar >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > HDDS-1205 brings in new ReplicationManager, this Jira is to refactor > ContainerReportProcessing logic in SCM so that it complements > ReplicationManager and plugin the new ReplicationManager code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1384) TestBlockOutputStreamWithFailures is failing
Nanda kumar created HDDS-1384: - Summary: TestBlockOutputStreamWithFailures is failing Key: HDDS-1384 URL: https://issues.apache.org/jira/browse/HDDS-1384 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Nanda kumar TestBlockOutputStreamWithFailures is failing with the following error {noformat} 2019-04-04 18:52:43,240 INFO volume.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(140)) - Scheduling a check for org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a 2019-04-04 18:52:43,240 INFO volume.HddsVolumeChecker (HddsVolumeChecker.java:checkAllVolumes(203)) - Scheduled health check for volume org.apache.hadoop.ozone.container.common.volume.HddsVolume@1f6c0e8a 2019-04-04 18:52:43,241 ERROR server.GrpcService (ExitUtils.java:terminate(133)) - Terminating with exit status 1: Failed to start Grpc server java.io.IOException: Failed to bind at org.apache.ratis.thirdparty.io.grpc.netty.NettyServer.start(NettyServer.java:253) at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:166) at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl.start(ServerImpl.java:81) at org.apache.ratis.grpc.server.GrpcService.startImpl(GrpcService.java:144) at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) at org.apache.ratis.server.impl.RaftServerRpcWithProxy.start(RaftServerRpcWithProxy.java:69) at org.apache.ratis.server.impl.RaftServerProxy.lambda$start$3(RaftServerProxy.java:300) at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:202) at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:298) at org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.start(XceiverServerRatis.java:419) at org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.start(OzoneContainer.java:186) at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.start(DatanodeStateMachine.java:169) at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.lambda$startDaemon$0(DatanodeStateMachine.java:338) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at org.apache.ratis.thirdparty.io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:130) at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:558) at org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1358) at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:501) at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:486) at org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1019) at org.apache.ratis.thirdparty.io.netty.channel.AbstractChannel.bind(AbstractChannel.java:254) at org.apache.ratis.thirdparty.io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:366) at org.apache.ratis.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) at org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) at org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462) at org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897) at org.apache.ratis.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ... 1 more {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1387) ConcurrentModificationException in TestMiniChaosOzoneCluster
Nanda kumar created HDDS-1387: - Summary: ConcurrentModificationException in TestMiniChaosOzoneCluster Key: HDDS-1387 URL: https://issues.apache.org/jira/browse/HDDS-1387 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Nanda kumar Assignee: Nanda kumar TestMiniChaosOzoneCluster is failing with the below exception {noformat} [ERROR] org.apache.hadoop.ozone.TestMiniChaosOzoneCluster Time elapsed: 265.679 s <<< ERROR! java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) at java.util.ArrayList$Itr.next(ArrayList.java:859) at org.apache.hadoop.ozone.MiniOzoneClusterImpl.stop(MiniOzoneClusterImpl.java:350) at org.apache.hadoop.ozone.MiniOzoneClusterImpl.shutdown(MiniOzoneClusterImpl.java:325) at org.apache.hadoop.ozone.MiniOzoneChaosCluster.shutdown(MiniOzoneChaosCluster.java:130) at org.apache.hadoop.ozone.TestMiniChaosOzoneCluster.shutdown(TestMiniChaosOzoneCluster.java:92) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1409) TestOzoneClientRetriesOnException is flaky
Nanda kumar created HDDS-1409: - Summary: TestOzoneClientRetriesOnException is flaky Key: HDDS-1409 URL: https://issues.apache.org/jira/browse/HDDS-1409 Project: Hadoop Distributed Data Store Issue Type: Test Reporter: Nanda kumar TestOzoneClientRetriesOnException is flaky, we get the below exception when it fails. {noformat} [ERROR] testMaxRetriesByOzoneClient(org.apache.hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException) Time elapsed: 16.227 s <<< FAILURE! java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.ozone.client.rpc.TestOzoneClientRetriesOnException.testMaxRetriesByOzoneClient(TestOzoneClientRetriesOnException.java:197) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1410) TestSCMNodeMetrics is flaky
Nanda kumar created HDDS-1410: - Summary: TestSCMNodeMetrics is flaky Key: HDDS-1410 URL: https://issues.apache.org/jira/browse/HDDS-1410 Project: Hadoop Distributed Data Store Issue Type: Test Components: test Reporter: Nanda kumar TestSCMNodeMetrics is flaky https://ci.anzix.net/job/ozone/16617/testReport/org.apache.hadoop.ozone.scm.node/TestSCMNodeMetrics/testNodeReportProcessing/ {noformat} java.lang.AssertionError: Bad value for metric NumNodeReportProcessed expected:<2> but was:<1> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.test.MetricsAsserts.assertCounter(MetricsAsserts.java:227) at org.apache.hadoop.ozone.scm.node.TestSCMNodeMetrics.testNodeReportProcessing(TestSCMNodeMetrics.java:107) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1411) Add unit test to check if SCM correctly sends close commands for containers in closing state after a restart
Nanda kumar created HDDS-1411: - Summary: Add unit test to check if SCM correctly sends close commands for containers in closing state after a restart Key: HDDS-1411 URL: https://issues.apache.org/jira/browse/HDDS-1411 Project: Hadoop Distributed Data Store Issue Type: Test Components: test Reporter: Nanda kumar When the container is in CLOSING state, SCM keeps sending close command to the datanode until the container is either moved to QUASI_CLOSED or CLOSED state. The frequency in which the close command is sent by SCM depends on the property {{hdds.scm.replication.thread.interval}}. We have to add a test case to verify whether SCM is sending close commands for containers in the closing state even after a restart. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1416) MiniOzoneCluster should set custom value for hdds.datanode.replication.work.dir
Nanda kumar created HDDS-1416: - Summary: MiniOzoneCluster should set custom value for hdds.datanode.replication.work.dir Key: HDDS-1416 URL: https://issues.apache.org/jira/browse/HDDS-1416 Project: Hadoop Distributed Data Store Issue Type: Test Components: test Affects Versions: 0.4.0 Reporter: Nanda kumar Datanode uses a temporary working directory for copying/replicating containers, the default location of this directory is read from system property {{java.io.tmpdir}}. Since all the datanodes are running in same machine/jvm in MiniOzoneCluster we might corrupt the data while the datanodes are moving containers as all the datanodes will be using the same working directory. While configuring datanode for MiniOzoneCluster, we should set custom value for {{hdds.datanode.replication.work.dir}} in each datanode instance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1417) After successfully importing a container, datanode should delete the container tar.gz file from working directory
Nanda kumar created HDDS-1417: - Summary: After successfully importing a container, datanode should delete the container tar.gz file from working directory Key: HDDS-1417 URL: https://issues.apache.org/jira/browse/HDDS-1417 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Affects Versions: 0.3.0 Reporter: Nanda kumar Assignee: Nanda kumar Whenever we want to replicate or copy a container from one datanode to another, we compress the container data and create a tar.gz file. This tar file is then copied from source datanode to destination datanode. In destination, we use a temporary working directory where this tar file is copied. Once the copying is complete we import the container. After importing the container we no longer need the tar file in the working directory of destination datanode, this has to be deleted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1433) Fix typo in hdds.proto
Nanda kumar created HDDS-1433: - Summary: Fix typo in hdds.proto Key: HDDS-1433 URL: https://issues.apache.org/jira/browse/HDDS-1433 Project: Hadoop Distributed Data Store Issue Type: Bug Affects Versions: 0.3.0 Reporter: Nanda kumar We got a typo in hdds.proto file - {{GetScmInfoRespsonseProto}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1434) TestDatanodeStateMachine is flaky
Nanda kumar created HDDS-1434: - Summary: TestDatanodeStateMachine is flaky Key: HDDS-1434 URL: https://issues.apache.org/jira/browse/HDDS-1434 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Nanda kumar TestDatanodeStateMachine is flaky. It has failed in the following build [https://builds.apache.org/job/PreCommit-HDDS-Build/2650/artifact/out/patch-unit-hadoop-hdds.txt] [https://builds.apache.org/job/hadoop-multibranch/job/PR-661/6/artifact/out/patch-unit-hadoop-hdds_container-service.txt] [https://builds.apache.org/job/PreCommit-HDDS-Build/2635/artifact/out/patch-unit-hadoop-hdds.txt] Stack trace: {noformat} java.lang.Thread.State: WAITING (on object monitor) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:389) at org.apache.hadoop.ozone.container.common.TestDatanodeStateMachine.testStartStopDatanodeStateMachine(TestDatanodeStateMachine.java:166) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) [INFO] [INFO] Results: [INFO] [ERROR] Errors: [ERROR] TestDatanodeStateMachine.testStartStopDatanodeStateMachine:166 ? Timeout Timed... {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1488) Scm cli command to start/stop replication manager
Nanda kumar created HDDS-1488: - Summary: Scm cli command to start/stop replication manager Key: HDDS-1488 URL: https://issues.apache.org/jira/browse/HDDS-1488 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: SCM Reporter: Nanda kumar Assignee: Nanda kumar It would be nice to have scmcli command to start/stop the ReplicationManager thread running in SCM -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1201) Reporting Corruptions in Containers to SCM
[ https://issues.apache.org/jira/browse/HDDS-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-1201. --- Resolution: Fixed Fix Version/s: 0.4.1 0.5.0 > Reporting Corruptions in Containers to SCM > -- > > Key: HDDS-1201 > URL: https://issues.apache.org/jira/browse/HDDS-1201 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode, SCM >Reporter: Supratim Deka >Assignee: Shweta >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0, 0.4.1 > > Time Spent: 40m > Remaining Estimate: 0h > > Add protocol message and handling to report container corruptions to the SCM. > Also add basic recovery handling in SCM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1647) Recon config tag does not show up on Ozone UI.
[ https://issues.apache.org/jira/browse/HDDS-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-1647. --- Resolution: Fixed > Recon config tag does not show up on Ozone UI. > -- > > Key: HDDS-1647 > URL: https://issues.apache.org/jira/browse/HDDS-1647 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Aravindan Vijayan >Assignee: Aravindan Vijayan >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Attachments: HDDS-1647-000.patch, Screen Shot 2019-06-05 at 10.02.59 > AM.png > > Time Spent: 50m > Remaining Estimate: 0h > > Recon tag does not show up on the list of tags on /conf page. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1652) HddsDispatcher should not shutdown volumeSet
[ https://issues.apache.org/jira/browse/HDDS-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-1652. --- Resolution: Fixed Fix Version/s: 0.4.1 0.5.0 > HddsDispatcher should not shutdown volumeSet > > > Key: HDDS-1652 > URL: https://issues.apache.org/jira/browse/HDDS-1652 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0, 0.4.1 > > Time Spent: 40m > Remaining Estimate: 0h > > Currently both OzoneContainer#stop() and HddsDispatcher#stop() both invoke > volumeSet.shutdown() explicitly to shutdown the same volume set. > > In addition, OzoneContainer#stop() will invoke HddsDispatcher#stop(). Since > the volume set object is created by OzoneContainer object, it should be the > responsibility of OzoneContainer to shutdown. This ticket is opened to remove > the volumeSet.shutdown() from HddsDispatcher#stop(). > > There are benchmark tools relies on HddsDispatcher#stop() to shutdown > volumeSet object, that we could fix with explict volumeSet#shutdown call. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1650) Fix Ozone tests leaking volume checker thread
[ https://issues.apache.org/jira/browse/HDDS-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-1650. --- Resolution: Fixed Fix Version/s: 0.4.1 0.5.0 > Fix Ozone tests leaking volume checker thread > - > > Key: HDDS-1650 > URL: https://issues.apache.org/jira/browse/HDDS-1650 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0, 0.4.1 > > Time Spent: 0.5h > Remaining Estimate: 0h > > There are a few test leaking hdds volume checker thread. This ticket is > opened to fix them. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1454) GC other system pause events can trigger pipeline destroy for all the nodes in the cluster
[ https://issues.apache.org/jira/browse/HDDS-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-1454. --- Resolution: Fixed Fix Version/s: 0.5.0 > GC other system pause events can trigger pipeline destroy for all the nodes > in the cluster > -- > > Key: HDDS-1454 > URL: https://issues.apache.org/jira/browse/HDDS-1454 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Mukul Kumar Singh >Assignee: Supratim Deka >Priority: Major > Labels: MiniOzoneChaosCluster, pull-request-available > Fix For: 0.5.0 > > Time Spent: 2h > Remaining Estimate: 0h > > In a MiniOzoneChaosCluster run it was observed that events like GC pauses or > any other pauses in SCM can mark all the datanodes as stale in SCM. This will > trigger multiple pipeline destroy and will render the system unusable. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1759) TestWatchForCommit crashes
Nanda kumar created HDDS-1759: - Summary: TestWatchForCommit crashes Key: HDDS-1759 URL: https://issues.apache.org/jira/browse/HDDS-1759 Project: Hadoop Distributed Data Store Issue Type: Test Components: test Reporter: Nanda kumar {{org.apache.hadoop.ozone.client.rpc.TestWatchForCommit}} is crashing with the following exception trace. {noformat} [ERROR] Crashed tests: [ERROR] org.apache.hadoop.ozone.client.rpc.TestWatchForCommit [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called? [ERROR] Command was /bin/sh -c cd /Users/nvadivelu/codebase/apache/hadoop/hadoop-ozone/integration-test && /Library/Java/JavaVirtualMachines/jdk1.8.0_152.jdk/Contents/Home/jre/bin/java -Xmx2048m -XX:+HeapDumpOnOutOfMemoryError -jar /Users/nvadivelu/codebase/apache/hadoop/hadoop-ozone/integration-test/target/surefire/surefirebooter6824244130326461346.jar /Users/nvadivelu/codebase/apache/hadoop/hadoop-ozone/integration-test/target/surefire 2019-07-03T10-47-23_862-jvmRun1 surefire1503013258446082728tmp surefire_07547129263746053478tmp [ERROR] Error occurred in starting fork, check output in log [ERROR] Process Exit Code: 1 [ERROR] Crashed tests: [ERROR] org.apache.hadoop.ozone.client.rpc.TestWatchForCommit [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:511) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:458) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:299) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:247) [ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1149) [ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:991) [ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:837) [ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134) [ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) [ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154) [ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146) [ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117) [ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81) [ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51) [ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128) [ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309) [ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194) [ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107) [ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:955) [ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:290) [ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:194) [ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [ERROR] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [ERROR] at java.lang.reflect.Method.invoke(Method.java:498) [ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) [ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) [ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) [ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) [ERROR] Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called? {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1770) SCM crashes when ReplicationManager is trying to re-replicate under replicated containers
Nanda kumar created HDDS-1770: - Summary: SCM crashes when ReplicationManager is trying to re-replicate under replicated containers Key: HDDS-1770 URL: https://issues.apache.org/jira/browse/HDDS-1770 Project: Hadoop Distributed Data Store Issue Type: Bug Components: SCM Reporter: Nanda kumar SCM crashes with the following exception when ReplicationManager is trying to re-replicate under replicated containers {noformat} 2019-07-08 12:46:36 ERROR ReplicationManager:215 - Exception in Replication Monitor Thread. java.lang.IllegalArgumentException: Affinity node /default-rack/aab15e2d07cc is not a member of topology at org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.checkAffinityNode(NetworkTopologyImpl.java:767) at org.apache.hadoop.hdds.scm.net.NetworkTopologyImpl.chooseRandom(NetworkTopologyImpl.java:407) at org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseNode(SCMContainerPlacementRackAware.java:242) at org.apache.hadoop.hdds.scm.container.placement.algorithms.SCMContainerPlacementRackAware.chooseDatanodes(SCMContainerPlacementRackAware.java:168) at org.apache.hadoop.hdds.scm.container.ReplicationManager.handleUnderReplicatedContainer(ReplicationManager.java:487) at org.apache.hadoop.hdds.scm.container.ReplicationManager.processContainer(ReplicationManager.java:293) at java.base/java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4698) at java.base/java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1083) at org.apache.hadoop.hdds.scm.container.ReplicationManager.run(ReplicationManager.java:205) at java.base/java.lang.Thread.run(Thread.java:834) 2019-07-08 12:46:36 INFO ExitUtil:210 - Exiting with status 1: java.lang.IllegalArgumentException: Affinity node /default-rack/aab15e2d07cc is not a member of topology 2019-07-08 12:46:36 INFO StorageContainerManagerStarter:51 - SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down StorageContainerManager at 8c763563f672/192.168.112.2 / {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1778) Fix existing blockade tests
Nanda kumar created HDDS-1778: - Summary: Fix existing blockade tests Key: HDDS-1778 URL: https://issues.apache.org/jira/browse/HDDS-1778 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Nanda kumar Assignee: Nanda kumar This jira is to track and fix existing blockade test cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1201) Reporting Corruptions in Containers to SCM
[ https://issues.apache.org/jira/browse/HDDS-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-1201. --- Resolution: Fixed > Reporting Corruptions in Containers to SCM > -- > > Key: HDDS-1201 > URL: https://issues.apache.org/jira/browse/HDDS-1201 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode, SCM >Reporter: Supratim Deka >Assignee: Hrishikesh Gadre >Priority: Critical > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Add protocol message and handling to report container corruptions to the SCM. > Also add basic recovery handling in SCM. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1784) Missing HostName and IpAddress in the response of register command
Nanda kumar created HDDS-1784: - Summary: Missing HostName and IpAddress in the response of register command Key: HDDS-1784 URL: https://issues.apache.org/jira/browse/HDDS-1784 Project: Hadoop Distributed Data Store Issue Type: Bug Components: SCM Reporter: Nanda kumar Assignee: Nanda kumar {{SCMNodeManager}} sets the HostName and IpAddress to the response of register command, but that is being ignored in {{SCMDatanodeProtocolServer}} while sending the response back to the datanode. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1754) getContainerWithPipeline fails with PipelineNotFoundException
[ https://issues.apache.org/jira/browse/HDDS-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-1754. --- Resolution: Fixed Fix Version/s: 0.5.0 > getContainerWithPipeline fails with PipelineNotFoundException > - > > Key: HDDS-1754 > URL: https://issues.apache.org/jira/browse/HDDS-1754 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Supratim Deka >Priority: Major > Labels: MiniOzoneChaosCluster, pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Once a pipeline is closed or finalized and it was not able to close all the > containers inside the pipeline. > Then getContainerWithPipeline will try to fetch the pipeline state from > pipelineManager after the pipeline has been closed. > {code} > 2019-07-02 20:48:20,370 INFO ipc.Server (Server.java:logException(2726)) - > IPC Server handler 13 on 50130, call Call#17339 Retry#0 > org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocol.getContainerWithPipeline > from 192.168.0.2:51452 > org.apache.hadoop.hdds.scm.pipeline.PipelineNotFoundException: > PipelineID=e1a7b16a-48d9-4194-9774-ad49ec9ad78b not found > at > org.apache.hadoop.hdds.scm.pipeline.PipelineStateMap.getPipeline(PipelineStateMap.java:132) > at > org.apache.hadoop.hdds.scm.pipeline.PipelineStateManager.getPipeline(PipelineStateManager.java:66) > at > org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager.getPipeline(SCMPipelineManager.java:184) > at > org.apache.hadoop.hdds.scm.server.SCMClientProtocolServer.getContainerWithPipeline(SCMClientProtocolServer.java:244) > at > org.apache.hadoop.ozone.protocolPB.StorageContainerLocationProtocolServerSideTranslatorPB.getContainerWithPipeline(StorageContainerLocationProtocolServerSideTranslatorPB.java:144) > at > org.apache.hadoop.hdds.protocol.proto.StorageContainerLocationProtocolProtos$StorageContainerLocationProtocolService$2.callBlockingMethod(StorageContainerLocationProtocolProtos.java:16390) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1752) ConcurrentModificationException while handling DeadNodeHandler event
[ https://issues.apache.org/jira/browse/HDDS-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-1752. --- Resolution: Fixed Fix Version/s: 0.4.1 Thanks [~hgadre] for the contribution and thanks to [~msingh] for reporting it. Committed it to trunk and ozone-0.4.1 branch. > ConcurrentModificationException while handling DeadNodeHandler event > > > Key: HDDS-1752 > URL: https://issues.apache.org/jira/browse/HDDS-1752 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Assignee: Hrishikesh Gadre >Priority: Major > Labels: MiniOzoneChaosCluster, pull-request-available > Fix For: 0.4.1 > > Time Spent: 20m > Remaining Estimate: 0h > > ConcurrentModificationException while handling DeadNodeHandler event > {code} > 2019-07-02 19:29:25,190 ERROR events.SingleThreadExecutor > (SingleThreadExecutor.java:lambda$onMessage$1(88)) - Error on execution > message 56591ec5-c9e4-416c-9a36-db0507739fe5{ip: 192.168.0.2, host: 192.16 > 8.0.2, networkLocation: /default-rack, certSerialId: null} > java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextNode(HashMap.java:1442) > at java.util.HashMap$KeyIterator.next(HashMap.java:1466) > at java.lang.Iterable.forEach(Iterable.java:74) > at > java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080) > at > org.apache.hadoop.hdds.scm.node.DeadNodeHandler.lambda$destroyPipelines$1(DeadNodeHandler.java:99) > at java.util.Optional.ifPresent(Optional.java:159) > at > org.apache.hadoop.hdds.scm.node.DeadNodeHandler.destroyPipelines(DeadNodeHandler.java:98) > at > org.apache.hadoop.hdds.scm.node.DeadNodeHandler.onMessage(DeadNodeHandler.java:78) > at > org.apache.hadoop.hdds.scm.node.DeadNodeHandler.onMessage(DeadNodeHandler.java:44) > at > org.apache.hadoop.hdds.server.events.SingleThreadExecutor.lambda$onMessage$1(SingleThreadExecutor.java:85) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1790) Fix checkstyle issues in TestDataScrubber
Nanda kumar created HDDS-1790: - Summary: Fix checkstyle issues in TestDataScrubber Key: HDDS-1790 URL: https://issues.apache.org/jira/browse/HDDS-1790 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Nanda kumar Assignee: Nanda kumar There are 4 Checkstyle issues in TestDataScrubber that has to be fixed {noformat} [ERROR] src/test/java/org/apache/hadoop/ozone/dn/scrubber/TestDataScrubber.java:[157] (sizes) LineLength: Line is longer than 80 characters (found 81). [ERROR] src/test/java/org/apache/hadoop/ozone/dn/scrubber/TestDataScrubber.java:[161] (sizes) LineLength: Line is longer than 80 characters (found 82). [ERROR] src/test/java/org/apache/hadoop/ozone/dn/scrubber/TestDataScrubber.java:[167] (sizes) LineLength: Line is longer than 80 characters (found 85). [ERROR] src/test/java/org/apache/hadoop/ozone/dn/scrubber/TestDataScrubber.java:[187] (sizes) LineLength: Line is longer than 80 characters (found 104). {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1791) Update network-tests/src/test/blockade/README.md file
Nanda kumar created HDDS-1791: - Summary: Update network-tests/src/test/blockade/README.md file Key: HDDS-1791 URL: https://issues.apache.org/jira/browse/HDDS-1791 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: test Reporter: Nanda kumar Assignee: Nanda kumar {{hadoop-ozone/fault-injection-test/network-tests/src/test/blockade/README.md}} has to be updated after HDDS-1778. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1759) TestWatchForCommit crashes
[ https://issues.apache.org/jira/browse/HDDS-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-1759. --- Resolution: Duplicate > TestWatchForCommit crashes > -- > > Key: HDDS-1759 > URL: https://issues.apache.org/jira/browse/HDDS-1759 > Project: Hadoop Distributed Data Store > Issue Type: Test > Components: test >Reporter: Nanda kumar >Priority: Major > > {{org.apache.hadoop.ozone.client.rpc.TestWatchForCommit}} is crashing with > the following exception trace. > {noformat} > [ERROR] Crashed tests: > [ERROR] org.apache.hadoop.ozone.client.rpc.TestWatchForCommit > [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: > ExecutionException The forked VM terminated without properly saying goodbye. > VM crash or System.exit called? > [ERROR] Command was /bin/sh -c cd > /Users/nvadivelu/codebase/apache/hadoop/hadoop-ozone/integration-test && > /Library/Java/JavaVirtualMachines/jdk1.8.0_152.jdk/Contents/Home/jre/bin/java > -Xmx2048m -XX:+HeapDumpOnOutOfMemoryError -jar > /Users/nvadivelu/codebase/apache/hadoop/hadoop-ozone/integration-test/target/surefire/surefirebooter6824244130326461346.jar > > /Users/nvadivelu/codebase/apache/hadoop/hadoop-ozone/integration-test/target/surefire > 2019-07-03T10-47-23_862-jvmRun1 surefire1503013258446082728tmp > surefire_07547129263746053478tmp > [ERROR] Error occurred in starting fork, check output in log > [ERROR] Process Exit Code: 1 > [ERROR] Crashed tests: > [ERROR] org.apache.hadoop.ozone.client.rpc.TestWatchForCommit > [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:511) > [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:458) > [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:299) > [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:247) > [ERROR] at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1149) > [ERROR] at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:991) > [ERROR] at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:837) > [ERROR] at > org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134) > [ERROR] at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) > [ERROR] at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154) > [ERROR] at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146) > [ERROR] at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117) > [ERROR] at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81) > [ERROR] at > org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51) > [ERROR] at > org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128) > [ERROR] at > org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309) > [ERROR] at > org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194) > [ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107) > [ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:955) > [ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:290) > [ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:194) > [ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [ERROR] at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > [ERROR] at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [ERROR] at java.lang.reflect.Method.invoke(Method.java:498) > [ERROR] at > org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) > [ERROR] at > org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) > [ERROR] at > org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) > [ERROR] at > org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) > [ERROR] Caused by: > org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM > terminated without properly saying goodbye. VM crash or System.exit called? > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) -
[jira] [Resolved] (HDDS-1036) container replica state in datanode should be QUASI-CLOSED if the datanode is isolated from other two datanodes in 3 datanode cluster
[ https://issues.apache.org/jira/browse/HDDS-1036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-1036. --- Resolution: Not A Problem Fixed as part of ReplicationManager refactoring. > container replica state in datanode should be QUASI-CLOSED if the datanode is > isolated from other two datanodes in 3 datanode cluster > - > > Key: HDDS-1036 > URL: https://issues.apache.org/jira/browse/HDDS-1036 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode, SCM >Reporter: Nilotpal Nandi >Assignee: Nanda kumar >Priority: Major > > steps taken : > --- > # created a 3 datanode docker cluster. > # wrote some data to create a pipeline. > # Then, one of the datanodes is isolated from other two datanodes. All > datanodes can communicate with SCM. > # Tried to write new data , write failed. > # Wait for 900 seconds. > Observation: > > container state is CLOSED in all three replicas. > > Expectation: > --- > container state in isolated datanode should be QUASI-CLOSED. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1810) SCM command to Activate and Deactivate piplines
Nanda kumar created HDDS-1810: - Summary: SCM command to Activate and Deactivate piplines Key: HDDS-1810 URL: https://issues.apache.org/jira/browse/HDDS-1810 Project: Hadoop Distributed Data Store Issue Type: New Feature Components: SCM, SCM Client Reporter: Nanda kumar Assignee: Nanda kumar It will be useful to have scm command to temporarily deactivate and re-activate a pipeline. This will help us a lot in debugging a pipeline. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1817) GetKey fails with IllegalArgumentException
Nanda kumar created HDDS-1817: - Summary: GetKey fails with IllegalArgumentException Key: HDDS-1817 URL: https://issues.apache.org/jira/browse/HDDS-1817 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Client, SCM Affects Versions: 0.4.0 Reporter: Nanda kumar During get key call the client is intermittently failing with {{java.lang.IllegalArgumentException}} {noformat} E AssertionError: Ozone get Key failed with output=[java.lang.IllegalArgumentException E at com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) E at org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:150) E at org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClientForReadData(XceiverClientManager.java:143) E at org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:154) E at org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:118) E at org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:222) E at org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171) E at org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47) E at java.base/java.io.InputStream.read(InputStream.java:205) E at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:94) E at org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:98) E at org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:48) E at picocli.CommandLine.execute(CommandLine.java:1173) E at picocli.CommandLine.access$800(CommandLine.java:141) E at picocli.CommandLine$RunLast.handle(CommandLine.java:1367) E at picocli.CommandLine$RunLast.handle(CommandLine.java:1335) E at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243) E at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) E at picocli.CommandLine.parseWithHandler(CommandLine.java:1465) E at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) E at org.apache.hadoop.ozone.web.ozShell.OzoneShell.execute(OzoneShell.java:60) E at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) E at org.apache.hadoop.ozone.web.ozShell.OzoneShell.main(OzoneShell.java:53)] {noformat} This is happening when the pipeline returned by SCM doesn't have any datanode information. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1821) BlockOutputStream#watchForCommit fails with UnsupportedOperationException when one DN is down
Nanda kumar created HDDS-1821: - Summary: BlockOutputStream#watchForCommit fails with UnsupportedOperationException when one DN is down Key: HDDS-1821 URL: https://issues.apache.org/jira/browse/HDDS-1821 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Client Reporter: Nanda kumar When one of the datanode from the ratis pipeline is excluded by introducing network failure, the client write is failing with the following exception {noformat} 2019-07-18 07:13:33 WARN XceiverClientRatis:262 - 3 way commit failed on pipeline Pipeline[ Id: b338512c-1a3b-4ae6-b89c-7b7737d9bd93, Nodes: ce90cf89-0444-45bf-8c49-a126d8da5a5f{ip: 192.168.240.4, host: ozoneblockade_datanode_2.ozoneblockade_default, networkLocation: /default-rack, certSerialId: null}fa65a457-155d-4bf3-8d1b-b0e11ec157ae{ip: 192.168.240.6, host: ozoneblockade_datanode_3.ozoneblockade_default, networkLocation: /default-rack, certSerialId: null}c5785c99-7dc2-4afc-9054-2efa28a41e7e{ip: 192.168.240.2, host: ozoneblockade_datanode_1.ozoneblockade_default, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:THREE, State:OPEN] E java.util.concurrent.ExecutionException: org.apache.ratis.protocol.NotReplicatedException: Request with call Id 2 and log index 9 is not yet replicated to ALL_COMMITTED E at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) E at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2022) E at org.apache.hadoop.hdds.scm.XceiverClientRatis.watchForCommit(XceiverClientRatis.java:259) E at org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchForCommit(CommitWatcher.java:194) E at org.apache.hadoop.hdds.scm.storage.CommitWatcher.watchOnLastIndex(CommitWatcher.java:157) E at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.watchForCommit(BlockOutputStream.java:348) E at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:480) E at org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:494) E at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143) E at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:434) E at org.apache.hadoop.ozone.client.io.KeyOutputStream.close(KeyOutputStream.java:472) E at org.apache.hadoop.ozone.client.io.OzoneOutputStream.close(OzoneOutputStream.java:60) E at org.apache.hadoop.ozone.freon.RandomKeyGenerator.createKey(RandomKeyGenerator.java:706) E at org.apache.hadoop.ozone.freon.RandomKeyGenerator.access$1100(RandomKeyGenerator.java:88) E at org.apache.hadoop.ozone.freon.RandomKeyGenerator$ObjectCreator.run(RandomKeyGenerator.java:609) E at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) E at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) E at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) E at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) E at java.base/java.lang.Thread.run(Thread.java:834) E Caused by: org.apache.ratis.protocol.NotReplicatedException: Request with call Id 2 and log index 9 is not yet replicated to ALL_COMMITTED E at org.apache.ratis.client.impl.ClientProtoUtils.toRaftClientReply(ClientProtoUtils.java:245) E at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:254) E at org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onNext(GrpcClientProtocolClient.java:249) E at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:421) E at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33) E at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33) E at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:519) E at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) E at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) E ... 3 more E 2019-07-18 07:13:33 INFO XceiverClientRatis:280 - Cou
[jira] [Created] (HDDS-1850) ReplicationManager should consider inflight replication and deletion while picking datanode for re-replication
Nanda kumar created HDDS-1850: - Summary: ReplicationManager should consider inflight replication and deletion while picking datanode for re-replication Key: HDDS-1850 URL: https://issues.apache.org/jira/browse/HDDS-1850 Project: Hadoop Distributed Data Store Issue Type: Bug Components: SCM Reporter: Nanda kumar Assignee: Nanda kumar When choosing the target datanode for re-replication {{ReplicationManager}} should consider the datanodes which are in inflight replication and deletion for the same container. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1851) ReplicationManager should not force close a container with one quasi-closed replica
Nanda kumar created HDDS-1851: - Summary: ReplicationManager should not force close a container with one quasi-closed replica Key: HDDS-1851 URL: https://issues.apache.org/jira/browse/HDDS-1851 Project: Hadoop Distributed Data Store Issue Type: Bug Components: SCM Reporter: Nanda kumar Assignee: Nanda kumar There is a case in {{ReplicationManager}} where we go ahead and close a quasi-closed container which has only one quasi-closed replica. We should not do this. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1853) Fix failing blockade test-cases
Nanda kumar created HDDS-1853: - Summary: Fix failing blockade test-cases Key: HDDS-1853 URL: https://issues.apache.org/jira/browse/HDDS-1853 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Nanda kumar Assignee: Nanda kumar This Jira is to fix and make sure that all the test-cases in blockade are working. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1854) Print intuitive error message at client when the pipeline returned by SCM has no datanode
Nanda kumar created HDDS-1854: - Summary: Print intuitive error message at client when the pipeline returned by SCM has no datanode Key: HDDS-1854 URL: https://issues.apache.org/jira/browse/HDDS-1854 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Ozone Client Reporter: Nanda kumar We are throwing {{IllegalArgumentException}} in OzoneClient when the pipeline returned by SCM doesn't have any datanode information. Instead of throwing {{IllegalArgumentException}}, we can throw custom user friendly exception which is easy to understand. Existing exception trace: {noformat} AssertionError: Ozone get Key failed with output=[java.lang.IllegalArgumentException at com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) at org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:150) at org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClientForReadData(XceiverClientManager.java:143) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:154) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:118) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:222) at org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171) at org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47) at java.base/java.io.InputStream.read(InputStream.java:205) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:94) at org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:98) at org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:48) at picocli.CommandLine.execute(CommandLine.java:1173) at picocli.CommandLine.access$800(CommandLine.java:141) at picocli.CommandLine$RunLast.handle(CommandLine.java:1367) at picocli.CommandLine$RunLast.handle(CommandLine.java:1335) at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243) at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) at picocli.CommandLine.parseWithHandler(CommandLine.java:1465) at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) at org.apache.hadoop.ozone.web.ozShell.OzoneShell.execute(OzoneShell.java:60) at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) at org.apache.hadoop.ozone.web.ozShell.OzoneShell.main(OzoneShell.java:53)] {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1855) TestStorageContainerManager#testScmProcessDatanodeHeartbeat is failing
Nanda kumar created HDDS-1855: - Summary: TestStorageContainerManager#testScmProcessDatanodeHeartbeat is failing Key: HDDS-1855 URL: https://issues.apache.org/jira/browse/HDDS-1855 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Nanda kumar Assignee: Nanda kumar {{TestStorageContainerManager#testScmProcessDatanodeHeartbeat}} is failing with the following exception {noformat} [ERROR] Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 106.315 s <<< FAILURE! - in org.apache.hadoop.ozone.TestStorageContainerManager [ERROR] testScmProcessDatanodeHeartbeat(org.apache.hadoop.ozone.TestStorageContainerManager) Time elapsed: 21.97 s <<< FAILURE! java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.ozone.TestStorageContainerManager.testScmProcessDatanodeHeartbeat(TestStorageContainerManager.java:531) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1851) ReplicationManager should not force close a container with one quasi-closed replica
[ https://issues.apache.org/jira/browse/HDDS-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-1851. --- Resolution: Not A Problem > ReplicationManager should not force close a container with one quasi-closed > replica > --- > > Key: HDDS-1851 > URL: https://issues.apache.org/jira/browse/HDDS-1851 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM >Reporter: Nanda kumar >Assignee: Nanda kumar >Priority: Major > > There is a case in {{ReplicationManager}} where we go ahead and close a > quasi-closed container which has only one quasi-closed replica. We should not > do this. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1887) Enable all the blockade test-cases
Nanda kumar created HDDS-1887: - Summary: Enable all the blockade test-cases Key: HDDS-1887 URL: https://issues.apache.org/jira/browse/HDDS-1887 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: test Reporter: Nanda kumar Assignee: Nanda kumar Some of the blockade tests were {{Ignored}} because of open issues, since most of the issues are resolved we can go ahead and enable all the ignored blockade test-cases. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1888) Add containers to node2container map in SCM as soon as a container is allocated
Nanda kumar created HDDS-1888: - Summary: Add containers to node2container map in SCM as soon as a container is allocated Key: HDDS-1888 URL: https://issues.apache.org/jira/browse/HDDS-1888 Project: Hadoop Distributed Data Store Issue Type: Bug Components: SCM Reporter: Nanda kumar Assignee: Nanda kumar In SCM node2container and node2pipeline maps are managed by NodeManager and pipeline2container map is managed by PipelineManager. Currently, when a container is allocated in SCM, it is added to pipeline2container map and we are not adding it to node2container map. We update the node2container map only when the datanode sends full container report. When a node is marked as dead, DeadNodeHandler processes the event and it gets the list of containers that are hosted by the dead datanode and updates the respective container replica state in ContainerManager. The list of containers on the datanode is read from node2container map, this map will be missing containers which are created recently (after the last container report). In such cases we not be able to remove the container replica information for those containers. In reality, these containers are under replicated, but SCM will never know. We should add containers to node2container map in SCM as soon as a container is allocated. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1882) TestReplicationManager failed with NPE in ReplicationManager.java
[ https://issues.apache.org/jira/browse/HDDS-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-1882. --- Resolution: Fixed Fix Version/s: 0.4.1 Target Version/s: 0.4.1 Thanks [~Sammi] for the contribution. Committed this to trunk and ozone-0.4.1 branch. > TestReplicationManager failed with NPE in ReplicationManager.java > -- > > Key: HDDS-1882 > URL: https://issues.apache.org/jira/browse/HDDS-1882 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Sammi Chen >Assignee: Sammi Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.4.1 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1902) Fix checkstyle issues in ContainerStateMachine
Nanda kumar created HDDS-1902: - Summary: Fix checkstyle issues in ContainerStateMachine Key: HDDS-1902 URL: https://issues.apache.org/jira/browse/HDDS-1902 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Reporter: Nanda kumar Assignee: Nanda kumar Fix checkstyle issues in ContainerStateMachine: Line is longer than 80 characters (found 85). -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1903) Use dynamic ports for SCM in TestSCMClientProtocolServer and TestSCMSecurityProtocolServer
Nanda kumar created HDDS-1903: - Summary: Use dynamic ports for SCM in TestSCMClientProtocolServer and TestSCMSecurityProtocolServer Key: HDDS-1903 URL: https://issues.apache.org/jira/browse/HDDS-1903 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Nanda kumar We should use dynamic port for SCM in the following test-cases * TestSCMClientProtocolServer * TestSCMSecurityProtocolServer -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1902) Fix checkstyle issues in ContainerStateMachine
[ https://issues.apache.org/jira/browse/HDDS-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-1902. --- Resolution: Duplicate > Fix checkstyle issues in ContainerStateMachine > -- > > Key: HDDS-1902 > URL: https://issues.apache.org/jira/browse/HDDS-1902 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Nanda kumar >Assignee: Nanda kumar >Priority: Minor > > Fix checkstyle issues in ContainerStateMachine: > Line is longer than 80 characters (found 85). -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1904) SCM cli: group container and pipeline related commands to separate subcommands
Nanda kumar created HDDS-1904: - Summary: SCM cli: group container and pipeline related commands to separate subcommands Key: HDDS-1904 URL: https://issues.apache.org/jira/browse/HDDS-1904 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: SCM Client Reporter: Nanda kumar In SCM CLI we have commands for container and pipelines, it would be easy/intuitive to group these commands to container and pipeline subcommands. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1905) PipelineActionHandler is not closing the pipeline when close action is received
Nanda kumar created HDDS-1905: - Summary: PipelineActionHandler is not closing the pipeline when close action is received Key: HDDS-1905 URL: https://issues.apache.org/jira/browse/HDDS-1905 Project: Hadoop Distributed Data Store Issue Type: Bug Components: SCM Reporter: Nanda kumar Assignee: Nanda kumar PipelineActionHandler is not closing the pipeline when close action is received. The bug was introduced as part of HDDS-1832 change. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1906) TestScmSafeMode#testSCMSafeModeRestrictedOp is failing
Nanda kumar created HDDS-1906: - Summary: TestScmSafeMode#testSCMSafeModeRestrictedOp is failing Key: HDDS-1906 URL: https://issues.apache.org/jira/browse/HDDS-1906 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Nanda kumar {noformat} [ERROR] testSCMSafeModeRestrictedOp(org.apache.hadoop.ozone.om.TestScmSafeMode) Time elapsed: 19.316 s <<< FAILURE! java.lang.AssertionError: Expected a org.apache.hadoop.hdds.scm.exceptions.SCMException to be thrown, but got the result: : ContainerInfo{id=1, state=OPEN, pipelineID=PipelineID=100fb566-2cc0-44d6-9897-e688af5c447f, stateEnterTime=137318188, owner=5c69dc7b-2a6b-4650-a625-a63117c11d2d} | Pipeline[ Id: 100fb566-2cc0-44d6-9897-e688af5c447f, Nodes: b91596ea-34ed-4628-a027-a1cdf05095be{ip: 127.0.0.1, host: localhost, networkLocation: /default-rack, certSerialId: null}, Type:STAND_ALONE, Factor:ONE, State:OPEN] at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:492) at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:377) at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:446) at org.apache.hadoop.ozone.om.TestScmSafeMode.testSCMSafeModeRestrictedOp(TestScmSafeMode.java:331) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1907) TestOzoneRpcClientWithRatis is failing with ACL errors
Nanda kumar created HDDS-1907: - Summary: TestOzoneRpcClientWithRatis is failing with ACL errors Key: HDDS-1907 URL: https://issues.apache.org/jira/browse/HDDS-1907 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Nanda kumar {noformat} [ERROR] testNativeAclsForKey(org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis) Time elapsed: 0.176 s <<< FAILURE! java.lang.AssertionError: Current acls:,[user:nvadivelu:a[ACCESS], group:staff:a[ACCESS], group:everyone:a[ACCESS], group:localaccounts:a[ACCESS], group:_appserverusr:a[ACCESS], group:admin:a[ACCESS], group:_appserveradm:a[ACCESS], group:_lpadmin:a[ACCESS], group:com.apple.sharepoint.group.1:a[ACCESS], group:com.apple.sharepoint.group.2:a[ACCESS], group:_appstore:a[ACCESS], group:_lpoperator:a[ACCESS], group:_developer:a[ACCESS], group:_analyticsusers:a[ACCESS], group:com.apple.access_ftp:a[ACCESS], group:com.apple.access_screensharing:a[ACCESS], group:com.apple.access_ssh:a[ACCESS], group:com.apple.sharepoint.group.3:a[ACCESS]] inheritedUserAcl:user:remoteUser:r[ACCESS] [ERROR] testNativeAclsForBucket(org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis) Time elapsed: 0.074 s <<< FAILURE! java.lang.AssertionError [ERROR] testNativeAclsForPrefix(org.apache.hadoop.ozone.client.rpc.TestOzoneRpcClientWithRatis) Time elapsed: 0.061 s <<< FAILURE! java.lang.AssertionError: Current acls:,[user:nvadivelu:a[ACCESS], group:staff:a[ACCESS], group:everyone:a[ACCESS], group:localaccounts:a[ACCESS], group:_appserverusr:a[ACCESS], group:admin:a[ACCESS], group:_appserveradm:a[ACCESS], group:_lpadmin:a[ACCESS], group:com.apple.sharepoint.group.1:a[ACCESS], group:com.apple.sharepoint.group.2:a[ACCESS], group:_appstore:a[ACCESS], group:_lpoperator:a[ACCESS], group:_developer:a[ACCESS], group:_analyticsusers:a[ACCESS], group:com.apple.access_ftp:a[ACCESS], group:com.apple.access_screensharing:a[ACCESS], group:com.apple.access_ssh:a[ACCESS], group:com.apple.sharepoint.group.3:a[ACCESS]] inheritedUserAcl:user:remoteUser:r[ACCESS] {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1908) TestMultiBlockWritesWithDnFailures is failing
Nanda kumar created HDDS-1908: - Summary: TestMultiBlockWritesWithDnFailures is failing Key: HDDS-1908 URL: https://issues.apache.org/jira/browse/HDDS-1908 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Nanda kumar TestMultiBlockWritesWithDnFailures is failing with the following exception {noformat} [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 30.992 s <<< FAILURE! - in org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures [ERROR] testMultiBlockWritesWithDnFailures(org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures) Time elapsed: 30.941 s <<< ERROR! INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Allocated 0 blocks. Requested 1 blocks at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:720) at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.allocateBlock(OzoneManagerProtocolClientSideTranslatorPB.java:752) at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateNewBlock(BlockOutputStreamEntryPool.java:248) at org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.allocateBlockIfNeeded(BlockOutputStreamEntryPool.java:296) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:201) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleRetry(KeyOutputStream.java:376) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleException(KeyOutputStream.java:325) at org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:231) at org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:193) at org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49) at java.io.OutputStream.write(OutputStream.java:75) at org.apache.hadoop.ozone.client.rpc.TestMultiBlockWritesWithDnFailures.testMultiBlockWritesWithDnFailures(TestMultiBlockWritesWithDnFailures.java:144) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1917) Ignore failing test-cases in TestSecureOzoneRpcClient
Nanda kumar created HDDS-1917: - Summary: Ignore failing test-cases in TestSecureOzoneRpcClient Key: HDDS-1917 URL: https://issues.apache.org/jira/browse/HDDS-1917 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Nanda kumar Assignee: Nanda kumar Ignore failing test-cases in TestSecureOzoneRpcClient. This will be fixed when HA support is added to acl operations. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1952) TestMiniChaosOzoneCluster may run until OOME
[ https://issues.apache.org/jira/browse/HDDS-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-1952. --- Resolution: Fixed Fix Version/s: 0.5.0 0.4.1 Thanks [~adoroszlai] for the contribution. Committed this to trunk and ozone-0.4.1 branch. > TestMiniChaosOzoneCluster may run until OOME > > > Key: HDDS-1952 > URL: https://issues.apache.org/jira/browse/HDDS-1952 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: test >Reporter: Doroszlai, Attila >Assignee: Doroszlai, Attila >Priority: Critical > Labels: pull-request-available > Fix For: 0.4.1, 0.5.0 > > Time Spent: 1h > Remaining Estimate: 0h > > {{TestMiniChaosOzoneCluster}} runs load generator on a cluster for supposedly > 1 minute, but it may run indefinitely until JVM crashes due to > OutOfMemoryError. > In 0.4.1 nightly build it crashed 29/30 times (and no tests were executed in > the remaining one run due to some other error). > Latest: > https://github.com/elek/ozone-ci/blob/3f553ed6ad358ba61a302967617de737d7fea01a/byscane/byscane-nightly-wggqd/integration/output.log#L5661-L5662 > When it crashes, it leaves GBs of data lying around. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1961) TestStorageContainerManager#testScmProcessDatanodeHeartbeat is flaky
Nanda kumar created HDDS-1961: - Summary: TestStorageContainerManager#testScmProcessDatanodeHeartbeat is flaky Key: HDDS-1961 URL: https://issues.apache.org/jira/browse/HDDS-1961 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Nanda kumar Assignee: Nanda kumar TestStorageContainerManager#testScmProcessDatanodeHeartbeat is flaky {noformat} [ERROR] testScmProcessDatanodeHeartbeat(org.apache.hadoop.ozone.TestStorageContainerManager) Time elapsed: 25.057 s <<< FAILURE! java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.ozone.TestStorageContainerManager.testScmProcessDatanodeHeartbeat(TestStorageContainerManager.java:531) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1967) TestBlockOutputStreamWithFailures is flaky
Nanda kumar created HDDS-1967: - Summary: TestBlockOutputStreamWithFailures is flaky Key: HDDS-1967 URL: https://issues.apache.org/jira/browse/HDDS-1967 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Nanda kumar {{TestBlockOutputStreamWithFailures}} is flaky. {noformat} [ERROR] test2DatanodesFailure(org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures) Time elapsed: 23.816 s <<< FAILURE! java.lang.AssertionError: expected:<4> but was:<8> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures.test2DatanodesFailure(TestBlockOutputStreamWithFailures.java:425) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {noformat} {noformat} [ERROR] testWatchForCommitDatanodeFailure(org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures) Time elapsed: 30.895 s <<< FAILURE! java.lang.AssertionError: expected:<2> but was:<3> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.ozone.client.rpc.TestBlockOutputStreamWithFailures.testWatchForCommitDatanodeFailure(TestBlockOutputStreamWithFailures.java:366) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunne
[jira] [Created] (HDDS-1977) Fix checkstyle issues introduced by HDDS-1894
Nanda kumar created HDDS-1977: - Summary: Fix checkstyle issues introduced by HDDS-1894 Key: HDDS-1977 URL: https://issues.apache.org/jira/browse/HDDS-1977 Project: Hadoop Distributed Data Store Issue Type: Bug Components: SCM Client Reporter: Nanda kumar Fix the checkstyle issues introduced by HDDS-1894 {noformat} [INFO] There are 6 errors reported by Checkstyle 8.8 with checkstyle/checkstyle.xml ruleset. [ERROR] src/main/java/org/apache/hadoop/hdds/scm/cli/pipeline/ListPipelinesSubcommand.java:[41,23] (whitespace) ParenPad: '(' is followed by whitespace. [ERROR] src/main/java/org/apache/hadoop/hdds/scm/cli/pipeline/ListPipelinesSubcommand.java:[42] (sizes) LineLength: Line is longer than 80 characters (found 88). [ERROR] src/main/java/org/apache/hadoop/hdds/scm/cli/pipeline/ListPipelinesSubcommand.java:[46,23] (whitespace) ParenPad: '(' is followed by whitespace. [ERROR] src/main/java/org/apache/hadoop/hdds/scm/cli/pipeline/ListPipelinesSubcommand.java:[47] (sizes) LineLength: Line is longer than 80 characters (found 90). [ERROR] src/main/java/org/apache/hadoop/hdds/scm/cli/pipeline/ListPipelinesSubcommand.java:[59] (sizes) LineLength: Line is longer than 80 characters (found 116). [ERROR] src/main/java/org/apache/hadoop/hdds/scm/cli/pipeline/ListPipelinesSubcommand.java:[60] (sizes) LineLength: Line is longer than 80 characters (found 120). {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1978) Create helper script to run blockade tests
Nanda kumar created HDDS-1978: - Summary: Create helper script to run blockade tests Key: HDDS-1978 URL: https://issues.apache.org/jira/browse/HDDS-1978 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: test Reporter: Nanda kumar Assignee: Nanda kumar To run blockade tests as part of jenkins job we need some kind of helper script. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1994) Compilation failure due to missing class ScmBlockLocationTestingClient
[ https://issues.apache.org/jira/browse/HDDS-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-1994. --- Resolution: Duplicate > Compilation failure due to missing class ScmBlockLocationTestingClient > -- > > Key: HDDS-1994 > URL: https://issues.apache.org/jira/browse/HDDS-1994 > Project: Hadoop Distributed Data Store > Issue Type: Task >Reporter: Hrishikesh Gadre >Assignee: Hrishikesh Gadre >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The ozone build is failing due to following compilation error, > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile > (default-testCompile) on project hadoop-ozone-ozone-manager: Compilation > failure: Compilation failure: > [ERROR] > /Users/hgadre/git-repo/upstream/hadoop/hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestKeyDeletingService.java:[94,17] > cannot find symbol > [ERROR] symbol: class ScmBlockLocationTestingClient > [ERROR] location: class org.apache.hadoop.ozone.om.TestKeyDeletingService > [ERROR] > /Users/hgadre/git-repo/upstream/hadoop/hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestKeyDeletingService.java:[116,17] > cannot find symbol > [ERROR] symbol: class ScmBlockLocationTestingClient > [ERROR] location: class org.apache.hadoop.ozone.om.TestKeyDeletingService > [ERROR] > /Users/hgadre/git-repo/upstream/hadoop/hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestKeyDeletingService.java:[143,17] > cannot find symbol > [ERROR] symbol: class ScmBlockLocationTestingClient > [ERROR] location: class org.apache.hadoop.ozone.om.TestKeyDeletingService > [ERROR] -> [Help 1] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1998) TestSecureContainerServer#testClientServerRatisGrpc is failing
Nanda kumar created HDDS-1998: - Summary: TestSecureContainerServer#testClientServerRatisGrpc is failing Key: HDDS-1998 URL: https://issues.apache.org/jira/browse/HDDS-1998 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Nanda kumar {{TestSecureContainerServer#testClientServerRatisGrpc}} is failing on trunk with the following error. {noformat} [ERROR] testClientServerRatisGrpc(org.apache.hadoop.ozone.container.server.TestSecureContainerServer) Time elapsed: 7.544 s <<< ERROR! java.io.IOException: Failed to command cmdType: CreateContainer containerID: 1566379872577 datanodeUuid: "87ebf146-2a8f-4060-8f06-615ed61a9fe0" createContainer { } at org.apache.hadoop.hdds.scm.XceiverClientSpi.sendCommand(XceiverClientSpi.java:113) at org.apache.hadoop.ozone.container.server.TestSecureContainerServer.runTestClientServer(TestSecureContainerServer.java:206) at org.apache.hadoop.ozone.container.server.TestSecureContainerServer.runTestClientServerRatis(TestSecureContainerServer.java:157) at org.apache.hadoop.ozone.container.server.TestSecureContainerServer.testClientServerRatisGrpc(TestSecureContainerServer.java:132) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) Caused by: java.util.concurrent.ExecutionException: org.apache.ratis.protocol.StateMachineException: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: Block token verification failed. Fail to find any token (empty or null.) at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at org.apache.hadoop.hdds.scm.XceiverClientSpi.sendCommand(XceiverClientSpi.java:110) ... 29 more Caused by: org.apache.ratis.protocol.StateMachineException: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: Block token verification failed. Fail to find any token (empty or null.) at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.lambda$applyTransaction$7(ContainerStateMachine.java:701) at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1595) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.conc
[jira] [Resolved] (HDDS-1922) Next button on the bottom of "static/docs/index.html" landing page does not work
[ https://issues.apache.org/jira/browse/HDDS-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-1922. --- Resolution: Cannot Reproduce > Next button on the bottom of "static/docs/index.html" landing page does not > work > > > Key: HDDS-1922 > URL: https://issues.apache.org/jira/browse/HDDS-1922 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Affects Versions: 0.4.0 >Reporter: Mukul Kumar Singh >Priority: Major > > On Ozone landing doc page, the next link doesn't work . -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2001) Update Ratis version to 0.4.0
Nanda kumar created HDDS-2001: - Summary: Update Ratis version to 0.4.0 Key: HDDS-2001 URL: https://issues.apache.org/jira/browse/HDDS-2001 Project: Hadoop Distributed Data Store Issue Type: Improvement Reporter: Nanda kumar Assignee: Nanda kumar Update Ratis version to 0.4.0 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2002) Update documentation for 0.4.1 release
Nanda kumar created HDDS-2002: - Summary: Update documentation for 0.4.1 release Key: HDDS-2002 URL: https://issues.apache.org/jira/browse/HDDS-2002 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: documentation Reporter: Nanda kumar Assignee: Nanda kumar We have to update Ozone documentation to reflect the latest changes made. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1303) Support native ACL for Ozone
[ https://issues.apache.org/jira/browse/HDDS-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-1303. --- Fix Version/s: 0.4.1 Resolution: Fixed > Support native ACL for Ozone > > > Key: HDDS-1303 > URL: https://issues.apache.org/jira/browse/HDDS-1303 > Project: Hadoop Distributed Data Store > Issue Type: New Feature >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Blocker > Fix For: 0.4.1 > > > add native acl support for OM operations -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-895) Remove command watcher from ReplicationManager
[ https://issues.apache.org/jira/browse/HDDS-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-895. -- Resolution: Implemented Implemented as part of HDDS-1205. > Remove command watcher from ReplicationManager > -- > > Key: HDDS-895 > URL: https://issues.apache.org/jira/browse/HDDS-895 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: SCM >Reporter: Nanda kumar >Assignee: Nanda kumar >Priority: Major > Attachments: HDDS-895.000.patch > > > We can remove the command watcher from {{ReplicationManager}} and user > internal timeout to retrigger the replication command. > Instead of waiting for every command that has been sent out to datanode, we > can use an internal timer to check if the container replica state has reached > the expected container state. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1304) Ozone ha breaks service discovery
[ https://issues.apache.org/jira/browse/HDDS-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nanda kumar resolved HDDS-1304. --- Resolution: Not A Problem > Ozone ha breaks service discovery > - > > Key: HDDS-1304 > URL: https://issues.apache.org/jira/browse/HDDS-1304 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Ajay Kumar >Assignee: Nanda kumar >Priority: Critical > > We need to redefine the semantics of what service discovery means with HA > enabled. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2028) Release Ozone 0.4.1
Nanda kumar created HDDS-2028: - Summary: Release Ozone 0.4.1 Key: HDDS-2028 URL: https://issues.apache.org/jira/browse/HDDS-2028 Project: Hadoop Distributed Data Store Issue Type: Test Reporter: Nanda kumar Assignee: Nanda kumar This jira is to track Ozone 0.4.1 release -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2029) Fix license issues on ozone-0.4.1
Nanda kumar created HDDS-2029: - Summary: Fix license issues on ozone-0.4.1 Key: HDDS-2029 URL: https://issues.apache.org/jira/browse/HDDS-2029 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Nanda kumar Assignee: Nanda kumar There are files on ozone-0.4.1 branch which doesn't have apache license header, they have to be fixed. {noformat} hadoop/hadoop-ozone/dist/src/main/compose/ozones3-haproxy/haproxy-conf/haproxy.cfg hadoop/hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestOzoneRpcClientForAclAuditLog.java hadoop/hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/response/s3/bucket/TestS3BucketDeleteResponse.java hadoop/hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/response/s3/multipart/TestS3MultipartUploadAbortResponse.java hadoop/hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/request/s3/multipart/TestS3MultipartUploadAbortRequest.java hadoop/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/response/key/OMKeyPurgeResponse.java hadoop/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/request/key/OMKeyPurgeRequest.java {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-2037) Fix hadoop version in pom.ozone.xml
Nanda kumar created HDDS-2037: - Summary: Fix hadoop version in pom.ozone.xml Key: HDDS-2037 URL: https://issues.apache.org/jira/browse/HDDS-2037 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Nanda kumar Assignee: Nanda kumar The hadoop version in pom.ozone.xml is pointing to SNAPSHOT version, this has to be fixed. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org