[jira] [Commented] (HDFS-15878) RBF: Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk
[ https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17398444#comment-17398444 ] Akira Ajisaka commented on HDFS-15878: -- The latest qbt log: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/597/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt > RBF: Flaky test > TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in > Trunk > > > Key: HDFS-15878 > URL: https://issues.apache.org/jira/browse/HDFS-15878 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, rbf >Reporter: Renukaprasad C >Assignee: Fengnan Li >Priority: Major > > ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: > 24.627 s <<< FAILURE! - in > org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate > [ERROR] > testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate) > Time elapsed: 0.222 s <<< ERROR! > java.io.FileNotFoundException: File /test/testSyncable not found. > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975) > at > org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556) > at > org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > Caused by: > org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException):
[jira] [Reopened] (HDFS-15878) RBF: Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk
[ https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka reopened HDFS-15878: -- This test still fails. > RBF: Flaky test > TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in > Trunk > > > Key: HDFS-15878 > URL: https://issues.apache.org/jira/browse/HDFS-15878 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, rbf >Reporter: Renukaprasad C >Assignee: Fengnan Li >Priority: Major > > ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: > 24.627 s <<< FAILURE! - in > org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate > [ERROR] > testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate) > Time elapsed: 0.222 s <<< ERROR! > java.io.FileNotFoundException: File /test/testSyncable not found. > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975) > at > org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556) > at > org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > Caused by: > org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File > /test/testSyncable not found. > at > org.apache.hadoop.hdfs.web.JsonUtilClient.toRemoteException(JsonUtilClient.java:90) > at >
[jira] [Resolved] (HDFS-16172) TestRouterWebHDFSContractCreate fails
[ https://issues.apache.org/jira/browse/HDFS-16172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka resolved HDFS-16172. -- Resolution: Duplicate > TestRouterWebHDFSContractCreate fails > - > > Key: HDFS-16172 > URL: https://issues.apache.org/jira/browse/HDFS-16172 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Reporter: Akira Ajisaka >Priority: Major > > {quote} > [INFO] Running > org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate > [ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: > 18.539 s <<< FAILURE! - in > org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate > [ERROR] > testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate) > Time elapsed: 0.51 s <<< ERROR! > java.io.FileNotFoundException: File /test/testSyncable not found. > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1900) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975) > at > org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556) > at > org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > Caused by: > org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File > /test/testSyncable not found. > at > org.apache.hadoop.hdfs.web.JsonUtilClient.toRemoteException(JsonUtilClient.java:90) > at >
[jira] [Created] (HDFS-16172) TestRouterWebHDFSContractCreate fails
Akira Ajisaka created HDFS-16172: Summary: TestRouterWebHDFSContractCreate fails Key: HDFS-16172 URL: https://issues.apache.org/jira/browse/HDFS-16172 Project: Hadoop HDFS Issue Type: Test Components: test Reporter: Akira Ajisaka {quote} [INFO] Running org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate [ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 18.539 s <<< FAILURE! - in org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate [ERROR] testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate) Time elapsed: 0.51 s <<< ERROR! java.io.FileNotFoundException: File /test/testSyncable not found. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1900) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975) at org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556) at org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File /test/testSyncable not found. at org.apache.hadoop.hdfs.web.JsonUtilClient.toRemoteException(JsonUtilClient.java:90) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:537) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$300(WebHdfsFileSystem.java:146) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:738) at
[jira] [Updated] (HDFS-16171) testDecommissionStatus is flaky (for both TestDecommissioningStatus and TestDecommissioningStatusWithBackoffMonitor)
[ https://issues.apache.org/jira/browse/HDFS-16171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Jasani updated HDFS-16171: Target Version/s: 3.4.0, 2.10.2, 3.2.3, 3.3.2 (was: 3.4.0, 3.2.3, 3.3.2) > testDecommissionStatus is flaky (for both TestDecommissioningStatus and > TestDecommissioningStatusWithBackoffMonitor) > > > Key: HDFS-16171 > URL: https://issues.apache.org/jira/browse/HDFS-16171 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > testDecommissionStatus keeps failing intermittently. > {code:java} > [ERROR] > testDecommissionStatus(org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor) > Time elapsed: 3.299 s <<< FAILURE! > java.lang.AssertionError: Unexpected num under-replicated blocks expected:<4> > but was:<3> > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:169) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor.testDecommissionStatus(TestDecommissioningStatusWithBackoffMonitor.java:136) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16171) testDecommissionStatus is flaky (for both TestDecommissioningStatus and TestDecommissioningStatusWithBackoffMonitor)
[ https://issues.apache.org/jira/browse/HDFS-16171?focusedWorklogId=637685=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637685 ] ASF GitHub Bot logged work on HDFS-16171: - Author: ASF GitHub Bot Created on: 13/Aug/21 05:03 Start Date: 13/Aug/21 05:03 Worklog Time Spent: 10m Work Description: virajjasani commented on pull request #3280: URL: https://github.com/apache/hadoop/pull/3280#issuecomment-898192874 FYI @ferhui @amahussein filed the Jira. How flaky is resolved? The no of under-replicated blocks on DN2 can either be 3 or 4 depending on actual blocks available in Datanode Storage. Hence, in order to make sure that once both DN1 and DN2 are decommissioned -- we have 4 under replicated blocks -- we need to first wait for total 8 blocks to be reported (including replicas) by both DNs together. This is the additional check. Once we make sure of this, we won't run in flaky test failures where sometimes due to 1 replica not being reported even before we start decommissioning, we might run into case where we can't asset all 4 blocks to be under replicated. Hence, I have added additional validation before we start decommissioning DN1. After recent changes, haven't seen test failing in multiple test runs. Could you please take a look? Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637685) Time Spent: 20m (was: 10m) > testDecommissionStatus is flaky (for both TestDecommissioningStatus and > TestDecommissioningStatusWithBackoffMonitor) > > > Key: HDFS-16171 > URL: https://issues.apache.org/jira/browse/HDFS-16171 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > testDecommissionStatus keeps failing intermittently. > {code:java} > [ERROR] > testDecommissionStatus(org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor) > Time elapsed: 3.299 s <<< FAILURE! > java.lang.AssertionError: Unexpected num under-replicated blocks expected:<4> > but was:<3> > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:169) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor.testDecommissionStatus(TestDecommissioningStatusWithBackoffMonitor.java:136) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16171) testDecommissionStatus is flaky (for both TestDecommissioningStatus and TestDecommissioningStatusWithBackoffMonitor)
[ https://issues.apache.org/jira/browse/HDFS-16171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16171: -- Labels: pull-request-available (was: ) > testDecommissionStatus is flaky (for both TestDecommissioningStatus and > TestDecommissioningStatusWithBackoffMonitor) > > > Key: HDFS-16171 > URL: https://issues.apache.org/jira/browse/HDFS-16171 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > testDecommissionStatus keeps failing intermittently. > {code:java} > [ERROR] > testDecommissionStatus(org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor) > Time elapsed: 3.299 s <<< FAILURE! > java.lang.AssertionError: Unexpected num under-replicated blocks expected:<4> > but was:<3> > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:169) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor.testDecommissionStatus(TestDecommissioningStatusWithBackoffMonitor.java:136) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16171) testDecommissionStatus is flaky (for both TestDecommissioningStatus and TestDecommissioningStatusWithBackoffMonitor)
[ https://issues.apache.org/jira/browse/HDFS-16171?focusedWorklogId=637683=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637683 ] ASF GitHub Bot logged work on HDFS-16171: - Author: ASF GitHub Bot Created on: 13/Aug/21 05:02 Start Date: 13/Aug/21 05:02 Worklog Time Spent: 10m Work Description: virajjasani edited a comment on pull request #3280: URL: https://github.com/apache/hadoop/pull/3280#issuecomment-897418170 Thanks @ferhui for the review. > This PR tile is different from HDFS-12188 Updated Jira title because testDecommissionStatus test is present in both `TestDecommissioningStatus` and `TestDecommissioningStatusWithBackoffMonitor`, hence by just mentioning testDecommissionStatus, we are taking care of both tests failures. > Do you explain why test is flaky and how you fix it? The no of under-replicated blocks on DN2 can either be 3 or 4 depending on actual blocks available in Datanode Storage. Hence, in order to make sure that once both DN1 and DN2 are decommissioned -- we have 4 under replicated blocks -- we need to first wait for total 8 blocks to be reported (including replicas) by both DNs together. This is the additional check. Once we make sure of this, we won't run in flaky test failures where sometimes due to 1 replica not being reported even before we start decommissioning, we might run into case where we can't asset all 4 blocks to be under replicated. Hence, I have added additional validation before we start decommissioning DN1. > I see you add synchronized to some functions, Does it help to fix flaky problems? Good point, it doesn't solve flaky problem as of now. I just kept it while running 2 tests in parallel so that config setup is synchronized but now it is not required. I will remove it. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637683) Remaining Estimate: 0h Time Spent: 10m > testDecommissionStatus is flaky (for both TestDecommissioningStatus and > TestDecommissioningStatusWithBackoffMonitor) > > > Key: HDFS-16171 > URL: https://issues.apache.org/jira/browse/HDFS-16171 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > testDecommissionStatus keeps failing intermittently. > {code:java} > [ERROR] > testDecommissionStatus(org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor) > Time elapsed: 3.299 s <<< FAILURE! > java.lang.AssertionError: Unexpected num under-replicated blocks expected:<4> > but was:<3> > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:169) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor.testDecommissionStatus(TestDecommissioningStatusWithBackoffMonitor.java:136) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16171) testDecommissionStatus is flaky (for both TestDecommissioningStatus and TestDecommissioningStatusWithBackoffMonitor)
Viraj Jasani created HDFS-16171: --- Summary: testDecommissionStatus is flaky (for both TestDecommissioningStatus and TestDecommissioningStatusWithBackoffMonitor) Key: HDFS-16171 URL: https://issues.apache.org/jira/browse/HDFS-16171 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Viraj Jasani Assignee: Viraj Jasani testDecommissionStatus keeps failing intermittently. {code:java} [ERROR] testDecommissionStatus(org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor) Time elapsed: 3.299 s <<< FAILURE! java.lang.AssertionError: Unexpected num under-replicated blocks expected:<4> but was:<3> at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.failNotEquals(Assert.java:835) at org.junit.Assert.assertEquals(Assert.java:647) at org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:169) at org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor.testDecommissionStatus(TestDecommissioningStatusWithBackoffMonitor.java:136) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-16163) Avoid locking entire blockPinningFailures map
[ https://issues.apache.org/jira/browse/HDFS-16163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-16163 started by Viraj Jasani. --- > Avoid locking entire blockPinningFailures map > - > > Key: HDFS-16163 > URL: https://issues.apache.org/jira/browse/HDFS-16163 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > In order for mover to exclude pinned blocks in subsequent iteration, we try > to put pinned blocks in a map of blockIds to set of Datanode sources. > However, while updating an entry of this map, we don't need to lock the > entire map. We can use fine-grained concurrency here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16163) Avoid locking entire blockPinningFailures map
[ https://issues.apache.org/jira/browse/HDFS-16163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Jasani updated HDFS-16163: Target Version/s: 3.4.0, 3.3.2 (was: 3.4.0, 3.2.3, 3.3.2) > Avoid locking entire blockPinningFailures map > - > > Key: HDFS-16163 > URL: https://issues.apache.org/jira/browse/HDFS-16163 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > In order for mover to exclude pinned blocks in subsequent iteration, we try > to put pinned blocks in a map of blockIds to set of Datanode sources. > However, while updating an entry of this map, we don't need to lock the > entire map. We can use fine-grained concurrency here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16163) Avoid locking entire blockPinningFailures map
[ https://issues.apache.org/jira/browse/HDFS-16163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Jasani updated HDFS-16163: Status: Patch Available (was: In Progress) > Avoid locking entire blockPinningFailures map > - > > Key: HDFS-16163 > URL: https://issues.apache.org/jira/browse/HDFS-16163 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > In order for mover to exclude pinned blocks in subsequent iteration, we try > to put pinned blocks in a map of blockIds to set of Datanode sources. > However, while updating an entry of this map, we don't need to lock the > entire map. We can use fine-grained concurrency here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16163) Avoid locking entire blockPinningFailures map
[ https://issues.apache.org/jira/browse/HDFS-16163?focusedWorklogId=637677=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637677 ] ASF GitHub Bot logged work on HDFS-16163: - Author: ASF GitHub Bot Created on: 13/Aug/21 04:40 Start Date: 13/Aug/21 04:40 Worklog Time Spent: 10m Work Description: virajjasani commented on pull request #3296: URL: https://github.com/apache/hadoop/pull/3296#issuecomment-898186659 Sure @ferhui, Thank you. Let me retrigger tests. On the side note, I am working on resolving two flakies: #3280 and #3235 Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637677) Time Spent: 1h 10m (was: 1h) > Avoid locking entire blockPinningFailures map > - > > Key: HDFS-16163 > URL: https://issues.apache.org/jira/browse/HDFS-16163 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > In order for mover to exclude pinned blocks in subsequent iteration, we try > to put pinned blocks in a map of blockIds to set of Datanode sources. > However, while updating an entry of this map, we don't need to lock the > entire map. We can use fine-grained concurrency here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16163) Avoid locking entire blockPinningFailures map
[ https://issues.apache.org/jira/browse/HDFS-16163?focusedWorklogId=637633=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637633 ] ASF GitHub Bot logged work on HDFS-16163: - Author: ASF GitHub Bot Created on: 13/Aug/21 01:44 Start Date: 13/Aug/21 01:44 Worklog Time Spent: 10m Work Description: ferhui commented on pull request #3296: URL: https://github.com/apache/hadoop/pull/3296#issuecomment-898096099 @virajjasani looks good. Close and reopen could not trigger CI, Could you please push an empty commit and trigger CI again. Failed UTs seem unrelated and i have filed new jiras to track them -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637633) Time Spent: 1h (was: 50m) > Avoid locking entire blockPinningFailures map > - > > Key: HDFS-16163 > URL: https://issues.apache.org/jira/browse/HDFS-16163 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > In order for mover to exclude pinned blocks in subsequent iteration, we try > to put pinned blocks in a map of blockIds to set of Datanode sources. > However, while updating an entry of this map, we don't need to lock the > entire map. We can use fine-grained concurrency here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16163) Avoid locking entire blockPinningFailures map
[ https://issues.apache.org/jira/browse/HDFS-16163?focusedWorklogId=637632=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637632 ] ASF GitHub Bot logged work on HDFS-16163: - Author: ASF GitHub Bot Created on: 13/Aug/21 01:42 Start Date: 13/Aug/21 01:42 Worklog Time Spent: 10m Work Description: ferhui closed pull request #3296: URL: https://github.com/apache/hadoop/pull/3296 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637632) Time Spent: 50m (was: 40m) > Avoid locking entire blockPinningFailures map > - > > Key: HDFS-16163 > URL: https://issues.apache.org/jira/browse/HDFS-16163 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > In order for mover to exclude pinned blocks in subsequent iteration, we try > to put pinned blocks in a map of blockIds to set of Datanode sources. > However, while updating an entry of this map, we don't need to lock the > entire map. We can use fine-grained concurrency here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16170) TestFileTruncate#testTruncateWithDataNodesShutdownImmediately fails
[ https://issues.apache.org/jira/browse/HDFS-16170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hui Fei updated HDFS-16170: --- Description: [ERROR] Tests run: 26, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 263.442 s <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.TestFileTruncate [ERROR] testTruncateWithDataNodesShutdownImmediately(org.apache.hadoop.hdfs.server.namenode.TestFileTruncate) Time elapsed: 4.291 s <<< FAILURE! java.lang.AssertionError at org.junit.Assert.fail(Assert.java:87) at org.junit.Assert.assertTrue(Assert.java:42) at org.junit.Assert.assertTrue(Assert.java:53) at org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.testTruncateWithDataNodesShutdownImmediately(TestFileTruncate.java:927) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) [ERROR] testTruncateWithDataNodesShutdownImmediately(org.apache.hadoop.hdfs.server.namenode.TestFileTruncate) Time elapsed: 3.868 s <<< FAILURE! java.lang.AssertionError at org.junit.Assert.fail(Assert.java:87) at org.junit.Assert.assertTrue(Assert.java:42) at org.junit.Assert.assertTrue(Assert.java:53) at org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.testTruncateWithDataNodesShutdownImmediately(TestFileTruncate.java:927) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) CI result is https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3296/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt > TestFileTruncate#testTruncateWithDataNodesShutdownImmediately fails > --- > > Key: HDFS-16170 > URL: https://issues.apache.org/jira/browse/HDFS-16170 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: 3.4.0 >Reporter: Hui Fei >Priority: Major > > [ERROR] Tests run: 26, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: > 263.442 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.TestFileTruncate [ERROR] > testTruncateWithDataNodesShutdownImmediately(org.apache.hadoop.hdfs.server.namenode.TestFileTruncate) > Time elapsed: 4.291 s <<< FAILURE! java.lang.AssertionError at > org.junit.Assert.fail(Assert.java:87) at > org.junit.Assert.assertTrue(Assert.java:42) at > org.junit.Assert.assertTrue(Assert.java:53) at > org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.testTruncateWithDataNodesShutdownImmediately(TestFileTruncate.java:927) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at >
[jira] [Created] (HDFS-16170) TestFileTruncate#testTruncateWithDataNodesShutdownImmediately fails
Hui Fei created HDFS-16170: -- Summary: TestFileTruncate#testTruncateWithDataNodesShutdownImmediately fails Key: HDFS-16170 URL: https://issues.apache.org/jira/browse/HDFS-16170 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: 3.4.0 Reporter: Hui Fei -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16169) TestBlockTokenWithDFSStriped#testEnd2End fails
[ https://issues.apache.org/jira/browse/HDFS-16169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hui Fei updated HDFS-16169: --- Description: [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 141.936 s <<< FAILURE! - in org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped [ERROR] testEnd2End(org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped) Time elapsed: 28.325 s <<< FAILURE! java.lang.AssertionError: expected:<9> but was:<10> at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.failNotEquals(Assert.java:835) at org.junit.Assert.assertEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:633) at org.apache.hadoop.hdfs.StripedFileTestUtil.verifyLocatedStripedBlocks(StripedFileTestUtil.java:344) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTestBalancerWithStripedFile(TestBalancer.java:1666) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.integrationTestWithStripedFile(TestBalancer.java:1601) at org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped.testEnd2End(TestBlockTokenWithDFSStriped.java:119) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) CI result is https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3296/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt > TestBlockTokenWithDFSStriped#testEnd2End fails > -- > > Key: HDFS-16169 > URL: https://issues.apache.org/jira/browse/HDFS-16169 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: 3.4.0 >Reporter: Hui Fei >Priority: Major > > [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 141.936 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped > [ERROR] > testEnd2End(org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped) > Time elapsed: 28.325 s <<< FAILURE! java.lang.AssertionError: expected:<9> > but was:<10> at org.junit.Assert.fail(Assert.java:89) at > org.junit.Assert.failNotEquals(Assert.java:835) at > org.junit.Assert.assertEquals(Assert.java:647) at > org.junit.Assert.assertEquals(Assert.java:633) at > org.apache.hadoop.hdfs.StripedFileTestUtil.verifyLocatedStripedBlocks(StripedFileTestUtil.java:344) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTestBalancerWithStripedFile(TestBalancer.java:1666) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.integrationTestWithStripedFile(TestBalancer.java:1601) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped.testEnd2End(TestBlockTokenWithDFSStriped.java:119) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:748) > > CI result is > https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3296/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HDFS-16169) TestBlockTokenWithDFSStriped#testEnd2End fails
Hui Fei created HDFS-16169: -- Summary: TestBlockTokenWithDFSStriped#testEnd2End fails Key: HDFS-16169 URL: https://issues.apache.org/jira/browse/HDFS-16169 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: 3.4.0 Reporter: Hui Fei -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16168) TestHDFSFileSystemContract#testAppend fails
[ https://issues.apache.org/jira/browse/HDFS-16168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hui Fei updated HDFS-16168: --- Description: [ERROR] Tests run: 46, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 225.478 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestHDFSFileSystemContract [ERROR] testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time elapsed: 30.14 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 3 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1002) at org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:938) at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:902) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at org.apache.hadoop.hdfs.AppendTestUtil.testAppend(AppendTestUtil.java:257) at org.apache.hadoop.hdfs.TestHDFSFileSystemContract.testAppend(TestHDFSFileSystemContract.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) [ERROR] testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time elapsed: 30.003 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 3 milliseconds at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:59) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) at org.apache.hadoop.ipc.Client.call(Client.java:1525) at org.apache.hadoop.ipc.Client.call(Client.java:1422) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129) at com.sun.proxy.$Proxy25.append(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:415) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) at com.sun.proxy.$Proxy26.append(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1385) at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1407) at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1476) at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1446) at org.apache.hadoop.hdfs.DistributedFileSystem$5.doCall(DistributedFileSystem.java:450) at org.apache.hadoop.hdfs.DistributedFileSystem$5.doCall(DistributedFileSystem.java:446) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:458) at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:427) at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1455) at org.apache.hadoop.hdfs.AppendTestUtil.testAppend(AppendTestUtil.java:255) at org.apache.hadoop.hdfs.TestHDFSFileSystemContract.testAppend(TestHDFSFileSystemContract.java:68) at
[jira] [Created] (HDFS-16168) TestHDFSFileSystemContract#testAppend fails
Hui Fei created HDFS-16168: -- Summary: TestHDFSFileSystemContract#testAppend fails Key: HDFS-16168 URL: https://issues.apache.org/jira/browse/HDFS-16168 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: 3.4.0 Reporter: Hui Fei -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16167) TestDFSInotifyEventInputStreamKerberized#testWithKerberizedCluster fails
[ https://issues.apache.org/jira/browse/HDFS-16167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hui Fei updated HDFS-16167: --- Description: [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 35.878 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized [ERROR] testWithKerberizedCluster(org.apache.hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized) Time elapsed: 20.957 s <<< ERROR! java.io.IOException: DestHost:destPort localhost:12652 , LocalHost:localPort e8a60ac68857/172.17.0.2:0. Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:914) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:889) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1583) at org.apache.hadoop.ipc.Client.call(Client.java:1525) at org.apache.hadoop.ipc.Client.call(Client.java:1422) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129) at com.sun.proxy.$Proxy23.getEditsFromTxid(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolTranslatorPB.java:1881) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) at com.sun.proxy.$Proxy24.getEditsFromTxid(Unknown Source) at org.apache.hadoop.hdfs.DFSInotifyEventInputStream.poll(DFSInotifyEventInputStream.java:105) at org.apache.hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized$1.run(TestDFSInotifyEventInputStreamKerberized.java:145) at org.apache.hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized$1.run(TestDFSInotifyEventInputStreamKerberized.java:116) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1900) at org.apache.hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized.testWithKerberizedCluster(TestDFSInotifyEventInputStreamKerberized.java:116) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:788) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1900)
[jira] [Created] (HDFS-16167) TestDFSInotifyEventInputStreamKerberized#testWithKerberizedCluster fails
Hui Fei created HDFS-16167: -- Summary: TestDFSInotifyEventInputStreamKerberized#testWithKerberizedCluster fails Key: HDFS-16167 URL: https://issues.apache.org/jira/browse/HDFS-16167 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: 3.4.0 Reporter: Hui Fei -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16166) TestDecommissionWithBackoffMonitor#testDecommissionWithCloseFileAndListOpenFiles fails
[ https://issues.apache.org/jira/browse/HDFS-16166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hui Fei updated HDFS-16166: --- Description: [ERROR] testDecommissionWithCloseFileAndListOpenFiles(org.apache.hadoop.hdfs.TestDecommissionWithBackoffMonitor) Time elapsed: 360.695 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 36 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.AdminStatesBaseTest.waitNodeState(AdminStatesBaseTest.java:346) at org.apache.hadoop.hdfs.AdminStatesBaseTest.waitNodeState(AdminStatesBaseTest.java:333) at org.apache.hadoop.hdfs.TestDecommission.testDecommissionWithCloseFileAndListOpenFiles(TestDecommission.java:912) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) CI result is https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3296/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt > TestDecommissionWithBackoffMonitor#testDecommissionWithCloseFileAndListOpenFiles > fails > -- > > Key: HDFS-16166 > URL: https://issues.apache.org/jira/browse/HDFS-16166 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.4.0 >Reporter: Hui Fei >Priority: Major > > [ERROR] > testDecommissionWithCloseFileAndListOpenFiles(org.apache.hadoop.hdfs.TestDecommissionWithBackoffMonitor) > Time elapsed: 360.695 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 36 > milliseconds at java.lang.Thread.sleep(Native Method) at > org.apache.hadoop.hdfs.AdminStatesBaseTest.waitNodeState(AdminStatesBaseTest.java:346) > at > org.apache.hadoop.hdfs.AdminStatesBaseTest.waitNodeState(AdminStatesBaseTest.java:333) > at > org.apache.hadoop.hdfs.TestDecommission.testDecommissionWithCloseFileAndListOpenFiles(TestDecommission.java:912) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:748) > > CI result is > https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3296/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16166) TestDecommissionWithBackoffMonitor#testDecommissionWithCloseFileAndListOpenFiles fails
[ https://issues.apache.org/jira/browse/HDFS-16166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hui Fei updated HDFS-16166: --- Parent: HDFS-15646 Issue Type: Sub-task (was: Task) > TestDecommissionWithBackoffMonitor#testDecommissionWithCloseFileAndListOpenFiles > fails > -- > > Key: HDFS-16166 > URL: https://issues.apache.org/jira/browse/HDFS-16166 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.4.0 >Reporter: Hui Fei >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16166) TestDecommissionWithBackoffMonitor#testDecommissionWithCloseFileAndListOpenFiles fails
Hui Fei created HDFS-16166: -- Summary: TestDecommissionWithBackoffMonitor#testDecommissionWithCloseFileAndListOpenFiles fails Key: HDFS-16166 URL: https://issues.apache.org/jira/browse/HDFS-16166 Project: Hadoop HDFS Issue Type: Task Affects Versions: 3.4.0 Reporter: Hui Fei -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16141) [FGL] Address permission related issues with File / Directory
[ https://issues.apache.org/jira/browse/HDFS-16141?focusedWorklogId=637568=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637568 ] ASF GitHub Bot logged work on HDFS-16141: - Author: ASF GitHub Bot Created on: 12/Aug/21 22:24 Start Date: 12/Aug/21 22:24 Worklog Time Spent: 10m Work Description: prasad-acit commented on pull request #3232: URL: https://github.com/apache/hadoop/pull/3232#issuecomment-898007634 Thanks @shvachko for review & feedback. I have addressed the comments, can you please take a look? Failed test has nothing related to the version, i found the issue & handled. Cause: Post restart, IBR & DN register threads also run with write global lock. When these threads release the lock, it has cascading effect on partition locks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637568) Time Spent: 1h 50m (was: 1h 40m) > [FGL] Address permission related issues with File / Directory > - > > Key: HDFS-16141 > URL: https://issues.apache.org/jira/browse/HDFS-16141 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > Post FGL implementation (MKDIR & Create File), there are existing UTs got > impacted which needs to be addressed. > Failed Tests: > TestDFSPermission > TestPermission > TestFileCreation > TestDFSMkdirs (Added tests) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDFS-16165) Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x
[ https://issues.apache.org/jira/browse/HDFS-16165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Osvath updated HDFS-16165: - Comment: was deleted (was: This request is on behalf of [Confluent, Inc|http://confluent.io/].) > Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x > -- > > Key: HDFS-16165 > URL: https://issues.apache.org/jira/browse/HDFS-16165 > Project: Hadoop HDFS > Issue Type: Wish > Environment: Can be reproduced in docker HDFS environment with > Kerberos > https://github.com/vdesabou/kafka-docker-playground/blob/93a93de293ad2f9bb22afb244f2d8729a178296e/connect/connect-hdfs2-sink/hdfs2-sink-ha-kerberos-repro-gss-exception.sh >Reporter: Daniel Osvath >Priority: Major > Labels: Confluent > > *Problem Description* > For more than a year Apache Kafka Connect users have been running into a > Kerberos renewal issue that causes our HDFS2 connectors to fail. > We have been able to consistently reproduce the issue under high load with 40 > connectors (threads) that use the library. When we try an alternate > workaround that uses the kerberos keytab on the system the connector operates > without issues. > We identified the root cause to be a race condition bug in the Hadoop 2.x > library that causes the ticker renewal to fail with the error below: > {code:java} > Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)We > reached the conclusion of the root cause once we tried the same environment > (40 connectors) with Hadoop 3.x, and our HDFS3 connectors and operated > without renewal issues. Additionally, identifying that the synchronization > issue has been fixed for the newer Hadoop 3.x releases we confirmed our > hypothesis about the root cause. Request > {code} > There are many changes in HDFS 3 > [UserGroupInformation.java|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java] > related to UGI synchronization which were done as part of > https://issues.apache.org/jira/browse/HADOOP-9747, and those changes suggest > some race conditions were happening with older version, i.e HDFS 2.x Which > would explain why we can reproduce the problem with HDFS2. > For example(among others): > {code:java} > private void relogin(HadoopLoginContext login, boolean ignoreLastLoginTime) > throws IOException { > // ensure the relogin is atomic to avoid leaving credentials in an > // inconsistent state. prevents other ugi instances, SASL, and SPNEGO > // from accessing or altering credentials during the relogin. > synchronized(login.getSubjectLock()) { > // another racing thread may have beat us to the relogin. > if (login == getLogin()) { > unprotectedRelogin(login, ignoreLastLoginTime); > } > } > } > {code} > All those changes were not backported to Hadoop 2.x (out HDFS2 connector uses > 2.10.1), on which several CDH distributions are based. > *Request* > We would like to ask for the synchronization fix to be backported to Hadoop > 2.x so that our users can operate without issues. > *Impact* > The older 2.x Hadoop version is used by our HDFS connector, which is used in > production by our community. Currently, the issue causes our HDFS connector > to fail, as it is unable to recover and renew the ticket at a later point. > Having the backported fix would allow our users to operate without issues > that require manual intervention every week (or few days in some cases). The > only workaround available to community for the issue is to run a command or > restart their workers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16165) Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x
[ https://issues.apache.org/jira/browse/HDFS-16165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Osvath updated HDFS-16165: - Labels: Confluent (was: ) > Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x > -- > > Key: HDFS-16165 > URL: https://issues.apache.org/jira/browse/HDFS-16165 > Project: Hadoop HDFS > Issue Type: Wish > Environment: Can be reproduced in docker HDFS environment with > Kerberos > https://github.com/vdesabou/kafka-docker-playground/blob/93a93de293ad2f9bb22afb244f2d8729a178296e/connect/connect-hdfs2-sink/hdfs2-sink-ha-kerberos-repro-gss-exception.sh >Reporter: Daniel Osvath >Priority: Major > Labels: Confluent > > *Problem Description* > For more than a year Apache Kafka Connect users have been running into a > Kerberos renewal issue that causes our HDFS2 connectors to fail. > We have been able to consistently reproduce the issue under high load with 40 > connectors (threads) that use the library. When we try an alternate > workaround that uses the kerberos keytab on the system the connector operates > without issues. > We identified the root cause to be a race condition bug in the Hadoop 2.x > library that causes the ticker renewal to fail with the error below: > {code:java} > Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)We > reached the conclusion of the root cause once we tried the same environment > (40 connectors) with Hadoop 3.x, and our HDFS3 connectors and operated > without renewal issues. Additionally, identifying that the synchronization > issue has been fixed for the newer Hadoop 3.x releases we confirmed our > hypothesis about the root cause. Request > {code} > There are many changes in HDFS 3 > [UserGroupInformation.java|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java] > related to UGI synchronization which were done as part of > https://issues.apache.org/jira/browse/HADOOP-9747, and those changes suggest > some race conditions were happening with older version, i.e HDFS 2.x Which > would explain why we can reproduce the problem with HDFS2. > For example(among others): > {code:java} > private void relogin(HadoopLoginContext login, boolean ignoreLastLoginTime) > throws IOException { > // ensure the relogin is atomic to avoid leaving credentials in an > // inconsistent state. prevents other ugi instances, SASL, and SPNEGO > // from accessing or altering credentials during the relogin. > synchronized(login.getSubjectLock()) { > // another racing thread may have beat us to the relogin. > if (login == getLogin()) { > unprotectedRelogin(login, ignoreLastLoginTime); > } > } > } > {code} > All those changes were not backported to Hadoop 2.x (out HDFS2 connector uses > 2.10.1), on which several CDH distributions are based. > *Request* > We would like to ask for the synchronization fix to be backported to Hadoop > 2.x so that our users can operate without issues. > *Impact* > The older 2.x Hadoop version is used by our HDFS connector, which is used in > production by our community. Currently, the issue causes our HDFS connector > to fail, as it is unable to recover and renew the ticket at a later point. > Having the backported fix would allow our users to operate without issues > that require manual intervention every week (or few days in some cases). The > only workaround available to community for the issue is to run a command or > restart their workers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-16165) Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x
[ https://issues.apache.org/jira/browse/HDFS-16165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17398208#comment-17398208 ] Daniel Osvath edited comment on HDFS-16165 at 8/12/21, 7:31 PM: This request is on behalf of [Confluent, Inc|http://confluent.io/]. was (Author: dosvath): This request is on behalf [Confluent, Inc|http://confluent.io]. > Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x > -- > > Key: HDFS-16165 > URL: https://issues.apache.org/jira/browse/HDFS-16165 > Project: Hadoop HDFS > Issue Type: Wish > Environment: Can be reproduced in docker HDFS environment with > Kerberos > https://github.com/vdesabou/kafka-docker-playground/blob/93a93de293ad2f9bb22afb244f2d8729a178296e/connect/connect-hdfs2-sink/hdfs2-sink-ha-kerberos-repro-gss-exception.sh >Reporter: Daniel Osvath >Priority: Major > > *Problem Description* > For more than a year Apache Kafka Connect users have been running into a > Kerberos renewal issue that causes our HDFS2 connectors to fail. > We have been able to consistently reproduce the issue under high load with 40 > connectors (threads) that use the library. When we try an alternate > workaround that uses the kerberos keytab on the system the connector operates > without issues. > We identified the root cause to be a race condition bug in the Hadoop 2.x > library that causes the ticker renewal to fail with the error below: > {code:java} > Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)We > reached the conclusion of the root cause once we tried the same environment > (40 connectors) with Hadoop 3.x, and our HDFS3 connectors and operated > without renewal issues. Additionally, identifying that the synchronization > issue has been fixed for the newer Hadoop 3.x releases we confirmed our > hypothesis about the root cause. Request > {code} > There are many changes in HDFS 3 > [UserGroupInformation.java|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java] > related to UGI synchronization which were done as part of > https://issues.apache.org/jira/browse/HADOOP-9747, and those changes suggest > some race conditions were happening with older version, i.e HDFS 2.x Which > would explain why we can reproduce the problem with HDFS2. > For example(among others): > {code:java} > private void relogin(HadoopLoginContext login, boolean ignoreLastLoginTime) > throws IOException { > // ensure the relogin is atomic to avoid leaving credentials in an > // inconsistent state. prevents other ugi instances, SASL, and SPNEGO > // from accessing or altering credentials during the relogin. > synchronized(login.getSubjectLock()) { > // another racing thread may have beat us to the relogin. > if (login == getLogin()) { > unprotectedRelogin(login, ignoreLastLoginTime); > } > } > } > {code} > All those changes were not backported to Hadoop 2.x (out HDFS2 connector uses > 2.10.1), on which several CDH distributions are based. > *Request* > We would like to ask for the synchronization fix to be backported to Hadoop > 2.x so that our users can operate without issues. > *Impact* > The older 2.x Hadoop version is used by our HDFS connector, which is used in > production by our community. Currently, the issue causes our HDFS connector > to fail, as it is unable to recover and renew the ticket at a later point. > Having the backported fix would allow our users to operate without issues > that require manual intervention every week (or few days in some cases). The > only workaround available to community for the issue is to run a command or > restart their workers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16163) Avoid locking entire blockPinningFailures map
[ https://issues.apache.org/jira/browse/HDFS-16163?focusedWorklogId=637481=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637481 ] ASF GitHub Bot logged work on HDFS-16163: - Author: ASF GitHub Bot Created on: 12/Aug/21 18:22 Start Date: 12/Aug/21 18:22 Worklog Time Spent: 10m Work Description: virajjasani commented on pull request #3296: URL: https://github.com/apache/hadoop/pull/3296#issuecomment-897868687 @ferhui Does this sound good? Only single map entry (not multiple) is updated at a time by any thread and hence CHM is much better candidate rather than synchronizing entire map. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637481) Time Spent: 40m (was: 0.5h) > Avoid locking entire blockPinningFailures map > - > > Key: HDFS-16163 > URL: https://issues.apache.org/jira/browse/HDFS-16163 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > In order for mover to exclude pinned blocks in subsequent iteration, we try > to put pinned blocks in a map of blockIds to set of Datanode sources. > However, while updating an entry of this map, we don't need to lock the > entire map. We can use fine-grained concurrency here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-16165) Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x
[ https://issues.apache.org/jira/browse/HDFS-16165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17398208#comment-17398208 ] Daniel Osvath edited comment on HDFS-16165 at 8/12/21, 5:57 PM: This request is on behalf [Confluent, Inc|http://confluent.io]. was (Author: dosvath): This request is on behalf [Confluent, Inc|confluent.io]. > Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x > -- > > Key: HDFS-16165 > URL: https://issues.apache.org/jira/browse/HDFS-16165 > Project: Hadoop HDFS > Issue Type: Wish > Environment: Can be reproduced in docker HDFS environment with > Kerberos > https://github.com/vdesabou/kafka-docker-playground/blob/93a93de293ad2f9bb22afb244f2d8729a178296e/connect/connect-hdfs2-sink/hdfs2-sink-ha-kerberos-repro-gss-exception.sh >Reporter: Daniel Osvath >Priority: Major > > *Problem Description* > For more than a year Apache Kafka Connect users have been running into a > Kerberos renewal issue that causes our HDFS2 connectors to fail. > We have been able to consistently reproduce the issue under high load with 40 > connectors (threads) that use the library. When we try an alternate > workaround that uses the kerberos keytab on the system the connector operates > without issues. > We identified the root cause to be a race condition bug in the Hadoop 2.x > library that causes the ticker renewal to fail with the error below: > {code:java} > Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)We > reached the conclusion of the root cause once we tried the same environment > (40 connectors) with Hadoop 3.x, and our HDFS3 connectors and operated > without renewal issues. Additionally, identifying that the synchronization > issue has been fixed for the newer Hadoop 3.x releases we confirmed our > hypothesis about the root cause. Request > {code} > There are many changes in HDFS 3 > [UserGroupInformation.java|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java] > related to UGI synchronization which were done as part of > https://issues.apache.org/jira/browse/HADOOP-9747, and those changes suggest > some race conditions were happening with older version, i.e HDFS 2.x Which > would explain why we can reproduce the problem with HDFS2. > For example(among others): > {code:java} > private void relogin(HadoopLoginContext login, boolean ignoreLastLoginTime) > throws IOException { > // ensure the relogin is atomic to avoid leaving credentials in an > // inconsistent state. prevents other ugi instances, SASL, and SPNEGO > // from accessing or altering credentials during the relogin. > synchronized(login.getSubjectLock()) { > // another racing thread may have beat us to the relogin. > if (login == getLogin()) { > unprotectedRelogin(login, ignoreLastLoginTime); > } > } > } > {code} > All those changes were not backported to Hadoop 2.x (out HDFS2 connector uses > 2.10.1), on which several CDH distributions are based. > *Request* > We would like to ask for the synchronization fix to be backported to Hadoop > 2.x so that our users can operate without issues. > *Impact* > The older 2.x Hadoop version is used by our HDFS connector, which is used in > production by our community. Currently, the issue causes our HDFS connector > to fail, as it is unable to recover and renew the ticket at a later point. > Having the backported fix would allow our users to operate without issues > that require manual intervention every week (or few days in some cases). The > only workaround available to community for the issue is to run a command or > restart their workers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16165) Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x
[ https://issues.apache.org/jira/browse/HDFS-16165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17398208#comment-17398208 ] Daniel Osvath commented on HDFS-16165: -- This request is on behalf [Confluent, Inc|confluent.io]. > Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x > -- > > Key: HDFS-16165 > URL: https://issues.apache.org/jira/browse/HDFS-16165 > Project: Hadoop HDFS > Issue Type: Wish > Environment: Can be reproduced in docker HDFS environment with > Kerberos > https://github.com/vdesabou/kafka-docker-playground/blob/93a93de293ad2f9bb22afb244f2d8729a178296e/connect/connect-hdfs2-sink/hdfs2-sink-ha-kerberos-repro-gss-exception.sh >Reporter: Daniel Osvath >Priority: Major > > *Problem Description* > For more than a year Apache Kafka Connect users have been running into a > Kerberos renewal issue that causes our HDFS2 connectors to fail. > We have been able to consistently reproduce the issue under high load with 40 > connectors (threads) that use the library. When we try an alternate > workaround that uses the kerberos keytab on the system the connector operates > without issues. > We identified the root cause to be a race condition bug in the Hadoop 2.x > library that causes the ticker renewal to fail with the error below: > {code:java} > Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)We > reached the conclusion of the root cause once we tried the same environment > (40 connectors) with Hadoop 3.x, and our HDFS3 connectors and operated > without renewal issues. Additionally, identifying that the synchronization > issue has been fixed for the newer Hadoop 3.x releases we confirmed our > hypothesis about the root cause. Request > {code} > There are many changes in HDFS 3 > [UserGroupInformation.java|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java] > related to UGI synchronization which were done as part of > https://issues.apache.org/jira/browse/HADOOP-9747, and those changes suggest > some race conditions were happening with older version, i.e HDFS 2.x Which > would explain why we can reproduce the problem with HDFS2. > For example(among others): > {code:java} > private void relogin(HadoopLoginContext login, boolean ignoreLastLoginTime) > throws IOException { > // ensure the relogin is atomic to avoid leaving credentials in an > // inconsistent state. prevents other ugi instances, SASL, and SPNEGO > // from accessing or altering credentials during the relogin. > synchronized(login.getSubjectLock()) { > // another racing thread may have beat us to the relogin. > if (login == getLogin()) { > unprotectedRelogin(login, ignoreLastLoginTime); > } > } > } > {code} > All those changes were not backported to Hadoop 2.x (out HDFS2 connector uses > 2.10.1), on which several CDH distributions are based. > *Request* > We would like to ask for the synchronization fix to be backported to Hadoop > 2.x so that our users can operate without issues. > *Impact* > The older 2.x Hadoop version is used by our HDFS connector, which is used in > production by our community. Currently, the issue causes our HDFS connector > to fail, as it is unable to recover and renew the ticket at a later point. > Having the backported fix would allow our users to operate without issues > that require manual intervention every week (or few days in some cases). The > only workaround available to community for the issue is to run a command or > restart their workers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16165) Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x
Daniel Osvath created HDFS-16165: Summary: Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x Key: HDFS-16165 URL: https://issues.apache.org/jira/browse/HDFS-16165 Project: Hadoop HDFS Issue Type: Wish Environment: Can be reproduced in docker HDFS environment with Kerberos https://github.com/vdesabou/kafka-docker-playground/blob/93a93de293ad2f9bb22afb244f2d8729a178296e/connect/connect-hdfs2-sink/hdfs2-sink-ha-kerberos-repro-gss-exception.sh Reporter: Daniel Osvath *Problem Description* For more than a year Apache Kafka Connect users have been running into a Kerberos renewal issue that causes our HDFS2 connectors to fail. We have been able to consistently reproduce the issue under high load with 40 connectors (threads) that use the library. When we try an alternate workaround that uses the kerberos keytab on the system the connector operates without issues. We identified the root cause to be a race condition bug in the Hadoop 2.x library that causes the ticker renewal to fail with the error below: {code:java} Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)We reached the conclusion of the root cause once we tried the same environment (40 connectors) with Hadoop 3.x, and our HDFS3 connectors and operated without renewal issues. Additionally, identifying that the synchronization issue has been fixed for the newer Hadoop 3.x releases we confirmed our hypothesis about the root cause. Request {code} There are many changes in HDFS 3 [UserGroupInformation.java|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java] related to UGI synchronization which were done as part of https://issues.apache.org/jira/browse/HADOOP-9747, and those changes suggest some race conditions were happening with older version, i.e HDFS 2.x Which would explain why we can reproduce the problem with HDFS2. For example(among others): {code:java} private void relogin(HadoopLoginContext login, boolean ignoreLastLoginTime) throws IOException { // ensure the relogin is atomic to avoid leaving credentials in an // inconsistent state. prevents other ugi instances, SASL, and SPNEGO // from accessing or altering credentials during the relogin. synchronized(login.getSubjectLock()) { // another racing thread may have beat us to the relogin. if (login == getLogin()) { unprotectedRelogin(login, ignoreLastLoginTime); } } } {code} All those changes were not backported to Hadoop 2.x (out HDFS2 connector uses 2.10.1), on which several CDH distributions are based. *Request* We would like to ask for the synchronization fix to be backported to Hadoop 2.x so that our users can operate without issues. *Impact* The older 2.x Hadoop version is used by our HDFS connector, which is used in production by our community. Currently, the issue causes our HDFS connector to fail, as it is unable to recover and renew the ticket at a later point. Having the backported fix would allow our users to operate without issues that require manual intervention every week (or few days in some cases). The only workaround available to community for the issue is to run a command or restart their workers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12188) TestDecommissioningStatus#testDecommissionStatus fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-12188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17398191#comment-17398191 ] Ahmed Hussein commented on HDFS-12188: -- Thanks [~vjasani]! I am looking forward to seeing the new jira. Just a brief description of how the refactored code brings more stability to the unit test would be good enough. > TestDecommissioningStatus#testDecommissionStatus fails intermittently > - > > Key: HDFS-12188 > URL: https://issues.apache.org/jira/browse/HDFS-12188 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Brahma Reddy Battula >Assignee: Ajay Kumar >Priority: Major > Labels: pull-request-available > Attachments: TestFailure_Log.txt > > Time Spent: 40m > Remaining Estimate: 0h > > {noformat} > java.lang.AssertionError: Unexpected num under-replicated blocks expected:<3> > but was:<4> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:144) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:240) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16157) Support configuring DNS record to get list of journal nodes.
[ https://issues.apache.org/jira/browse/HDFS-16157?focusedWorklogId=637400=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637400 ] ASF GitHub Bot logged work on HDFS-16157: - Author: ASF GitHub Bot Created on: 12/Aug/21 16:07 Start Date: 12/Aug/21 16:07 Worklog Time Spent: 10m Work Description: fengnanli commented on pull request #3284: URL: https://github.com/apache/hadoop/pull/3284#issuecomment-897765432 LGTM. Let's wait for some time if others have some comments. @Hexiaoqiao FYI. We are doing this in a series to help reducing the dependency on a single host name. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637400) Time Spent: 20m (was: 10m) > Support configuring DNS record to get list of journal nodes. > > > Key: HDFS-16157 > URL: https://issues.apache.org/jira/browse/HDFS-16157 > Project: Hadoop HDFS > Issue Type: Improvement > Components: journal-node >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > We can use a DNS round-robin record to configure list of journal nodes, so we > don't have to reconfigure everything journal node hostname is changed. For > example, in some containerized environment the hostname of journal nodes can > change pretty often. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12188) TestDecommissioningStatus#testDecommissionStatus fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-12188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17398102#comment-17398102 ] Viraj Jasani edited comment on HDFS-12188 at 8/12/21, 2:43 PM: --- Sure [~ahussein], I think it makes sense to not loose the history. Although the Exception stacktrace is exactly matching but this Jira was filed long back and the same test might likely have had different issue. Let me detach PR from this one and create new Jira. Thanks. Sorry for the noise everyone. Let me get back to this Jira once new Jira is filed, has more test results to rely on and then we can decide on whether we would like to attach this Jira to new one and close this. was (Author: vjasani): Sure [~ahussein], I think it makes sense to not loose the history. Although the Exception stacktrace is exactly matching but this Jira was filed long back and the same test might likely have had different issue. Let me detach PR from this one and create new Jira. Thanks. > TestDecommissioningStatus#testDecommissionStatus fails intermittently > - > > Key: HDFS-12188 > URL: https://issues.apache.org/jira/browse/HDFS-12188 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Brahma Reddy Battula >Assignee: Ajay Kumar >Priority: Major > Labels: pull-request-available > Attachments: TestFailure_Log.txt > > Time Spent: 40m > Remaining Estimate: 0h > > {noformat} > java.lang.AssertionError: Unexpected num under-replicated blocks expected:<3> > but was:<4> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:144) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:240) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12188) TestDecommissioningStatus#testDecommissionStatus fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-12188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Jasani updated HDFS-12188: Summary: TestDecommissioningStatus#testDecommissionStatus fails intermittently (was: De-flake testDecommissionStatus) > TestDecommissioningStatus#testDecommissionStatus fails intermittently > - > > Key: HDFS-12188 > URL: https://issues.apache.org/jira/browse/HDFS-12188 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Brahma Reddy Battula >Assignee: Ajay Kumar >Priority: Major > Labels: pull-request-available > Attachments: TestFailure_Log.txt > > Time Spent: 40m > Remaining Estimate: 0h > > {noformat} > java.lang.AssertionError: Unexpected num under-replicated blocks expected:<3> > but was:<4> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:144) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:240) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12188) De-flake testDecommissionStatus
[ https://issues.apache.org/jira/browse/HDFS-12188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17398102#comment-17398102 ] Viraj Jasani commented on HDFS-12188: - Sure [~ahussein], I think it makes sense to not loose the history. Although the Exception stacktrace is exactly matching but this Jira was filed long back and the same test might likely have had different issue. Let me detach PR from this one and create new Jira. Thanks. > De-flake testDecommissionStatus > --- > > Key: HDFS-12188 > URL: https://issues.apache.org/jira/browse/HDFS-12188 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Brahma Reddy Battula >Assignee: Ajay Kumar >Priority: Major > Labels: pull-request-available > Attachments: TestFailure_Log.txt > > Time Spent: 40m > Remaining Estimate: 0h > > {noformat} > java.lang.AssertionError: Unexpected num under-replicated blocks expected:<3> > but was:<4> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:144) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:240) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12188) De-flake testDecommissionStatus
[ https://issues.apache.org/jira/browse/HDFS-12188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17398097#comment-17398097 ] Ahmed Hussein commented on HDFS-12188: -- Hi [~vjasani] Thanks for taking a look at this jira. I have few comments: * can you please describe what the purpose of the change to resolve the issue? * if it is not evident that the changes are resolving the original issue this ticket was filed for, then it would be better to open a new jira , then later this very jira can be resolved. You can also link this jira to the new one. In That way, anyone who was aware of the problem would be able to understand that it has been resolved. * Changing title/description for an old jira is not very good idea (unless there was error or typo) because it will be difficult for other developers to see that what they have filed/contributed to has been resolved. They need to guess then look inside the transitions/history of the jiras to find what they have been looking for. * For future jiras, I personally prefer that a new jira is filed leaving the existing ones as they are (until they are marked as resolved). > De-flake testDecommissionStatus > --- > > Key: HDFS-12188 > URL: https://issues.apache.org/jira/browse/HDFS-12188 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Brahma Reddy Battula >Assignee: Ajay Kumar >Priority: Major > Labels: pull-request-available > Attachments: TestFailure_Log.txt > > Time Spent: 40m > Remaining Estimate: 0h > > {noformat} > java.lang.AssertionError: Unexpected num under-replicated blocks expected:<3> > but was:<4> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:144) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:240) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16162) Improve DFSUtil#checkProtectedDescendants() related parameter comments
[ https://issues.apache.org/jira/browse/HDFS-16162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JiangHua Zhu updated HDFS-16162: Component/s: documentation > Improve DFSUtil#checkProtectedDescendants() related parameter comments > -- > > Key: HDFS-16162 > URL: https://issues.apache.org/jira/browse/HDFS-16162 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Some parameter comments related to DFSUtil#checkProtectedDescendants() are > missing, for example: > /** > * If the given directory has any non-empty protected descendants, throw > * (Including itself). > * > * @param iip directory, to check its descendants. > * @throws AccessControlException if it is a non-empty protected > descendant > *found. > * @throws ParentNotDirectoryException > * @throws UnresolvedLinkException > */ > public static void checkProtectedDescendants( > FSDirectory fsd, INodesInPath iip) > Throw AccessControlException, UnresolvedLinkException, > ParentNotDirectoryException { > The description of fsd is missing here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-12188) De-flake testDecommissionStatus
[ https://issues.apache.org/jira/browse/HDFS-12188?focusedWorklogId=637274=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637274 ] ASF GitHub Bot logged work on HDFS-12188: - Author: ASF GitHub Bot Created on: 12/Aug/21 07:39 Start Date: 12/Aug/21 07:39 Worklog Time Spent: 10m Work Description: virajjasani commented on pull request #3280: URL: https://github.com/apache/hadoop/pull/3280#issuecomment-897418170 Thanks @ferhui for the review. > This PR tile is different from HDFS-12188 Updated Jira title because testDecommissionStatus test is present in both `TestDecommissioningStatus` and `TestDecommissioningStatusWithBackoffMonitor`, hence by just mentioning testDecommissionStatus, we are taking care of both tests failures. > Do you explain why test is flaky and how you fix it? The no of under-replicated blocks on Datanode2 can either be 3 or 4 depending on actual blocks available in Datanode Storage. This is the only reason behind flakiness, hence our logic should be check for count being 3/4. If under replicated blocks are anything other than 3 or 4, then this test has some other genuine failure case. > I see you add synchronized to some functions, Does it help to fix flaky problems? Good point, it doesn't solve flaky problem as of now. I just kept it while running 2 tests in parallel so that config setup is synchronized but now it is not required. I will remove it. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637274) Time Spent: 40m (was: 0.5h) > De-flake testDecommissionStatus > --- > > Key: HDFS-12188 > URL: https://issues.apache.org/jira/browse/HDFS-12188 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Brahma Reddy Battula >Assignee: Ajay Kumar >Priority: Major > Labels: pull-request-available > Attachments: TestFailure_Log.txt > > Time Spent: 40m > Remaining Estimate: 0h > > {noformat} > java.lang.AssertionError: Unexpected num under-replicated blocks expected:<3> > but was:<4> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:144) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:240) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12188) De-flake testDecommissionStatus
[ https://issues.apache.org/jira/browse/HDFS-12188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Jasani updated HDFS-12188: Summary: De-flake testDecommissionStatus (was: TestDecommissioningStatus#testDecommissionStatus fails intermittently) > De-flake testDecommissionStatus > --- > > Key: HDFS-12188 > URL: https://issues.apache.org/jira/browse/HDFS-12188 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Brahma Reddy Battula >Assignee: Ajay Kumar >Priority: Major > Labels: pull-request-available > Attachments: TestFailure_Log.txt > > Time Spent: 0.5h > Remaining Estimate: 0h > > {noformat} > java.lang.AssertionError: Unexpected num under-replicated blocks expected:<3> > but was:<4> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:144) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:240) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16163) Avoid locking entire blockPinningFailures map
[ https://issues.apache.org/jira/browse/HDFS-16163?focusedWorklogId=637254=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637254 ] ASF GitHub Bot logged work on HDFS-16163: - Author: ASF GitHub Bot Created on: 12/Aug/21 06:28 Start Date: 12/Aug/21 06:28 Worklog Time Spent: 10m Work Description: virajjasani commented on pull request #3296: URL: https://github.com/apache/hadoop/pull/3296#issuecomment-897382966 Thanks for taking look @ferhui. Yes this is perf optimization however I just came across this while going through an non-relevant issue in mover because of which I was comparing all the diff b/ Hadoop 2.10 and latest 3.3 release. That original issue is still under investigation but while looking into all differences, I came across HDFS-11164 and realized that just to update/add one single key->value pair, we are locking entire map and hence I thought of fixing this. I just tested this locally for it's sanity and correctness but unfortunately I don't have perf results because it was simple test. The other way to look into this is with simplicity: Unless we are updating multiple entries in single batch, we don't need to lock entire map and for single entry update, we can rather use fine-grained ConcurrentHashMap provided utilities. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 637254) Time Spent: 0.5h (was: 20m) > Avoid locking entire blockPinningFailures map > - > > Key: HDFS-16163 > URL: https://issues.apache.org/jira/browse/HDFS-16163 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > In order for mover to exclude pinned blocks in subsequent iteration, we try > to put pinned blocks in a map of blockIds to set of Datanode sources. > However, while updating an entry of this map, we don't need to lock the > entire map. We can use fine-grained concurrency here. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org