[jira] [Commented] (HDFS-15878) RBF: Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk

2021-08-12 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17398444#comment-17398444
 ] 

Akira Ajisaka commented on HDFS-15878:
--

The latest qbt log: 
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/597/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt

> RBF: Flaky test 
> TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in 
> Trunk
> 
>
> Key: HDFS-15878
> URL: https://issues.apache.org/jira/browse/HDFS-15878
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, rbf
>Reporter: Renukaprasad C
>Assignee: Fengnan Li
>Priority: Major
>
> ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 24.627 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate)
>   Time elapsed: 0.222 s  <<< ERROR!
> java.io.FileNotFoundException: File /test/testSyncable not found.
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): 

[jira] [Reopened] (HDFS-15878) RBF: Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk

2021-08-12 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka reopened HDFS-15878:
--

This test still fails.

> RBF: Flaky test 
> TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in 
> Trunk
> 
>
> Key: HDFS-15878
> URL: https://issues.apache.org/jira/browse/HDFS-15878
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, rbf
>Reporter: Renukaprasad C
>Assignee: Fengnan Li
>Priority: Major
>
> ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 24.627 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate)
>   Time elapsed: 0.222 s  <<< ERROR!
> java.io.FileNotFoundException: File /test/testSyncable not found.
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File 
> /test/testSyncable not found.
>   at 
> org.apache.hadoop.hdfs.web.JsonUtilClient.toRemoteException(JsonUtilClient.java:90)
>   at 
> 

[jira] [Resolved] (HDFS-16172) TestRouterWebHDFSContractCreate fails

2021-08-12 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka resolved HDFS-16172.
--
Resolution: Duplicate

> TestRouterWebHDFSContractCreate fails
> -
>
> Key: HDFS-16172
> URL: https://issues.apache.org/jira/browse/HDFS-16172
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Reporter: Akira Ajisaka
>Priority: Major
>
> {quote}
> [INFO] Running 
> org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
> [ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 18.539 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate)
>   Time elapsed: 0.51 s  <<< ERROR!
> java.io.FileNotFoundException: File /test/testSyncable not found.
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1900)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File 
> /test/testSyncable not found.
>   at 
> org.apache.hadoop.hdfs.web.JsonUtilClient.toRemoteException(JsonUtilClient.java:90)
>   at 
> 

[jira] [Created] (HDFS-16172) TestRouterWebHDFSContractCreate fails

2021-08-12 Thread Akira Ajisaka (Jira)
Akira Ajisaka created HDFS-16172:


 Summary: TestRouterWebHDFSContractCreate fails
 Key: HDFS-16172
 URL: https://issues.apache.org/jira/browse/HDFS-16172
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Reporter: Akira Ajisaka


{quote}
[INFO] Running 
org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
[ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 18.539 
s <<< FAILURE! - in 
org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
[ERROR] 
testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate)
  Time elapsed: 0.51 s  <<< ERROR!
java.io.FileNotFoundException: File /test/testSyncable not found.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1900)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975)
at 
org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556)
at 
org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
Caused by: 
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File 
/test/testSyncable not found.
at 
org.apache.hadoop.hdfs.web.JsonUtilClient.toRemoteException(JsonUtilClient.java:90)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:537)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$300(WebHdfsFileSystem.java:146)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:738)
at 

[jira] [Updated] (HDFS-16171) testDecommissionStatus is flaky (for both TestDecommissioningStatus and TestDecommissioningStatusWithBackoffMonitor)

2021-08-12 Thread Viraj Jasani (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani updated HDFS-16171:

Target Version/s: 3.4.0, 2.10.2, 3.2.3, 3.3.2  (was: 3.4.0, 3.2.3, 3.3.2)

> testDecommissionStatus is flaky (for both TestDecommissioningStatus and 
> TestDecommissioningStatusWithBackoffMonitor)
> 
>
> Key: HDFS-16171
> URL: https://issues.apache.org/jira/browse/HDFS-16171
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> testDecommissionStatus keeps failing intermittently.
> {code:java}
> [ERROR] 
> testDecommissionStatus(org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor)
>   Time elapsed: 3.299 s  <<< FAILURE!
> java.lang.AssertionError: Unexpected num under-replicated blocks expected:<4> 
> but was:<3>
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.failNotEquals(Assert.java:835)
>   at org.junit.Assert.assertEquals(Assert.java:647)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:169)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor.testDecommissionStatus(TestDecommissioningStatusWithBackoffMonitor.java:136)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16171) testDecommissionStatus is flaky (for both TestDecommissioningStatus and TestDecommissioningStatusWithBackoffMonitor)

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16171?focusedWorklogId=637685=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637685
 ]

ASF GitHub Bot logged work on HDFS-16171:
-

Author: ASF GitHub Bot
Created on: 13/Aug/21 05:03
Start Date: 13/Aug/21 05:03
Worklog Time Spent: 10m 
  Work Description: virajjasani commented on pull request #3280:
URL: https://github.com/apache/hadoop/pull/3280#issuecomment-898192874


   FYI @ferhui @amahussein filed the Jira.
   
   How flaky is resolved?
   
   The no of under-replicated blocks on DN2 can either be 3 or 4 depending on 
actual blocks available in Datanode Storage. Hence, in order to make sure that 
once both DN1 and DN2 are decommissioned -- we have 4 under replicated blocks 
-- we need to first wait for total 8 blocks to be reported (including replicas) 
by both DNs together. This is the additional check. Once we make sure of this, 
we won't run in flaky test failures where sometimes due to 1 replica not being 
reported even before we start decommissioning, we might run into case where we 
can't asset all 4 blocks to be under replicated.
   Hence, I have added additional validation before we start decommissioning 
DN1.
   
   After recent changes, haven't seen test failing in multiple test runs. Could 
you please take a look?
   Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637685)
Time Spent: 20m  (was: 10m)

> testDecommissionStatus is flaky (for both TestDecommissioningStatus and 
> TestDecommissioningStatusWithBackoffMonitor)
> 
>
> Key: HDFS-16171
> URL: https://issues.apache.org/jira/browse/HDFS-16171
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> testDecommissionStatus keeps failing intermittently.
> {code:java}
> [ERROR] 
> testDecommissionStatus(org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor)
>   Time elapsed: 3.299 s  <<< FAILURE!
> java.lang.AssertionError: Unexpected num under-replicated blocks expected:<4> 
> but was:<3>
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.failNotEquals(Assert.java:835)
>   at org.junit.Assert.assertEquals(Assert.java:647)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:169)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor.testDecommissionStatus(TestDecommissioningStatusWithBackoffMonitor.java:136)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16171) testDecommissionStatus is flaky (for both TestDecommissioningStatus and TestDecommissioningStatusWithBackoffMonitor)

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16171:
--
Labels: pull-request-available  (was: )

> testDecommissionStatus is flaky (for both TestDecommissioningStatus and 
> TestDecommissioningStatusWithBackoffMonitor)
> 
>
> Key: HDFS-16171
> URL: https://issues.apache.org/jira/browse/HDFS-16171
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> testDecommissionStatus keeps failing intermittently.
> {code:java}
> [ERROR] 
> testDecommissionStatus(org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor)
>   Time elapsed: 3.299 s  <<< FAILURE!
> java.lang.AssertionError: Unexpected num under-replicated blocks expected:<4> 
> but was:<3>
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.failNotEquals(Assert.java:835)
>   at org.junit.Assert.assertEquals(Assert.java:647)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:169)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor.testDecommissionStatus(TestDecommissioningStatusWithBackoffMonitor.java:136)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16171) testDecommissionStatus is flaky (for both TestDecommissioningStatus and TestDecommissioningStatusWithBackoffMonitor)

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16171?focusedWorklogId=637683=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637683
 ]

ASF GitHub Bot logged work on HDFS-16171:
-

Author: ASF GitHub Bot
Created on: 13/Aug/21 05:02
Start Date: 13/Aug/21 05:02
Worklog Time Spent: 10m 
  Work Description: virajjasani edited a comment on pull request #3280:
URL: https://github.com/apache/hadoop/pull/3280#issuecomment-897418170


   Thanks @ferhui for the review.
   
   > This PR tile is different from HDFS-12188
   
   Updated Jira title because testDecommissionStatus test is present in both 
`TestDecommissioningStatus` and `TestDecommissioningStatusWithBackoffMonitor`, 
hence by just mentioning testDecommissionStatus, we are taking care of both 
tests failures.
   
   > Do you explain why test is flaky and how you fix it?
   
   The no of under-replicated blocks on DN2 can either be 3 or 4 depending on 
actual blocks available in Datanode Storage. Hence, in order to make sure that 
once both DN1 and DN2 are decommissioned -- we have 4 under replicated blocks 
-- we need to first wait for total 8 blocks to be reported (including replicas) 
by both DNs together. This is the additional check. Once we make sure of this, 
we won't run in flaky test failures where sometimes due to 1 replica not being 
reported even before we start decommissioning, we might run into case where we 
can't asset all 4 blocks to be under replicated.
   Hence, I have added additional validation before we start decommissioning 
DN1.
   
   > I see you add synchronized to some functions, Does it help to fix flaky 
problems?
   
   Good point, it doesn't solve flaky problem as of now. I just kept it while 
running 2 tests in parallel so that config setup is synchronized but now it is 
not required. I will remove it. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637683)
Remaining Estimate: 0h
Time Spent: 10m

> testDecommissionStatus is flaky (for both TestDecommissioningStatus and 
> TestDecommissioningStatusWithBackoffMonitor)
> 
>
> Key: HDFS-16171
> URL: https://issues.apache.org/jira/browse/HDFS-16171
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> testDecommissionStatus keeps failing intermittently.
> {code:java}
> [ERROR] 
> testDecommissionStatus(org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor)
>   Time elapsed: 3.299 s  <<< FAILURE!
> java.lang.AssertionError: Unexpected num under-replicated blocks expected:<4> 
> but was:<3>
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.failNotEquals(Assert.java:835)
>   at org.junit.Assert.assertEquals(Assert.java:647)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:169)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor.testDecommissionStatus(TestDecommissioningStatusWithBackoffMonitor.java:136)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16171) testDecommissionStatus is flaky (for both TestDecommissioningStatus and TestDecommissioningStatusWithBackoffMonitor)

2021-08-12 Thread Viraj Jasani (Jira)
Viraj Jasani created HDFS-16171:
---

 Summary: testDecommissionStatus is flaky (for both 
TestDecommissioningStatus and TestDecommissioningStatusWithBackoffMonitor)
 Key: HDFS-16171
 URL: https://issues.apache.org/jira/browse/HDFS-16171
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Viraj Jasani
Assignee: Viraj Jasani


testDecommissionStatus keeps failing intermittently.
{code:java}
[ERROR] 
testDecommissionStatus(org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor)
  Time elapsed: 3.299 s  <<< FAILURE!
java.lang.AssertionError: Unexpected num under-replicated blocks expected:<4> 
but was:<3>
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:647)
at 
org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:169)
at 
org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor.testDecommissionStatus(TestDecommissioningStatusWithBackoffMonitor.java:136)

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-16163) Avoid locking entire blockPinningFailures map

2021-08-12 Thread Viraj Jasani (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-16163 started by Viraj Jasani.
---
> Avoid locking entire blockPinningFailures map
> -
>
> Key: HDFS-16163
> URL: https://issues.apache.org/jira/browse/HDFS-16163
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In order for mover to exclude pinned blocks in subsequent iteration, we try 
> to put pinned blocks in a map of blockIds to set of Datanode sources. 
> However, while updating an entry of this map, we don't need to lock the 
> entire map. We can use fine-grained concurrency here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16163) Avoid locking entire blockPinningFailures map

2021-08-12 Thread Viraj Jasani (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani updated HDFS-16163:

Target Version/s: 3.4.0, 3.3.2  (was: 3.4.0, 3.2.3, 3.3.2)

> Avoid locking entire blockPinningFailures map
> -
>
> Key: HDFS-16163
> URL: https://issues.apache.org/jira/browse/HDFS-16163
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In order for mover to exclude pinned blocks in subsequent iteration, we try 
> to put pinned blocks in a map of blockIds to set of Datanode sources. 
> However, while updating an entry of this map, we don't need to lock the 
> entire map. We can use fine-grained concurrency here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16163) Avoid locking entire blockPinningFailures map

2021-08-12 Thread Viraj Jasani (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani updated HDFS-16163:

Status: Patch Available  (was: In Progress)

> Avoid locking entire blockPinningFailures map
> -
>
> Key: HDFS-16163
> URL: https://issues.apache.org/jira/browse/HDFS-16163
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In order for mover to exclude pinned blocks in subsequent iteration, we try 
> to put pinned blocks in a map of blockIds to set of Datanode sources. 
> However, while updating an entry of this map, we don't need to lock the 
> entire map. We can use fine-grained concurrency here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16163) Avoid locking entire blockPinningFailures map

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16163?focusedWorklogId=637677=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637677
 ]

ASF GitHub Bot logged work on HDFS-16163:
-

Author: ASF GitHub Bot
Created on: 13/Aug/21 04:40
Start Date: 13/Aug/21 04:40
Worklog Time Spent: 10m 
  Work Description: virajjasani commented on pull request #3296:
URL: https://github.com/apache/hadoop/pull/3296#issuecomment-898186659


   Sure @ferhui, Thank you. Let me retrigger tests. On the side note, I am 
working on resolving two flakies: #3280 and #3235
   Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637677)
Time Spent: 1h 10m  (was: 1h)

> Avoid locking entire blockPinningFailures map
> -
>
> Key: HDFS-16163
> URL: https://issues.apache.org/jira/browse/HDFS-16163
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In order for mover to exclude pinned blocks in subsequent iteration, we try 
> to put pinned blocks in a map of blockIds to set of Datanode sources. 
> However, while updating an entry of this map, we don't need to lock the 
> entire map. We can use fine-grained concurrency here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16163) Avoid locking entire blockPinningFailures map

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16163?focusedWorklogId=637633=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637633
 ]

ASF GitHub Bot logged work on HDFS-16163:
-

Author: ASF GitHub Bot
Created on: 13/Aug/21 01:44
Start Date: 13/Aug/21 01:44
Worklog Time Spent: 10m 
  Work Description: ferhui commented on pull request #3296:
URL: https://github.com/apache/hadoop/pull/3296#issuecomment-898096099


   @virajjasani looks good.
   Close and reopen could not trigger CI, Could you please push an empty commit 
and trigger CI again.
   Failed UTs seem unrelated and i have filed new jiras to track them


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637633)
Time Spent: 1h  (was: 50m)

> Avoid locking entire blockPinningFailures map
> -
>
> Key: HDFS-16163
> URL: https://issues.apache.org/jira/browse/HDFS-16163
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In order for mover to exclude pinned blocks in subsequent iteration, we try 
> to put pinned blocks in a map of blockIds to set of Datanode sources. 
> However, while updating an entry of this map, we don't need to lock the 
> entire map. We can use fine-grained concurrency here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16163) Avoid locking entire blockPinningFailures map

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16163?focusedWorklogId=637632=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637632
 ]

ASF GitHub Bot logged work on HDFS-16163:
-

Author: ASF GitHub Bot
Created on: 13/Aug/21 01:42
Start Date: 13/Aug/21 01:42
Worklog Time Spent: 10m 
  Work Description: ferhui closed pull request #3296:
URL: https://github.com/apache/hadoop/pull/3296


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637632)
Time Spent: 50m  (was: 40m)

> Avoid locking entire blockPinningFailures map
> -
>
> Key: HDFS-16163
> URL: https://issues.apache.org/jira/browse/HDFS-16163
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In order for mover to exclude pinned blocks in subsequent iteration, we try 
> to put pinned blocks in a map of blockIds to set of Datanode sources. 
> However, while updating an entry of this map, we don't need to lock the 
> entire map. We can use fine-grained concurrency here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16170) TestFileTruncate#testTruncateWithDataNodesShutdownImmediately fails

2021-08-12 Thread Hui Fei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Fei updated HDFS-16170:
---
Description: 
[ERROR] Tests run: 26, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 
263.442 s <<< FAILURE! - in 
org.apache.hadoop.hdfs.server.namenode.TestFileTruncate [ERROR] 
testTruncateWithDataNodesShutdownImmediately(org.apache.hadoop.hdfs.server.namenode.TestFileTruncate)
 Time elapsed: 4.291 s <<< FAILURE! java.lang.AssertionError at 
org.junit.Assert.fail(Assert.java:87) at 
org.junit.Assert.assertTrue(Assert.java:42) at 
org.junit.Assert.assertTrue(Assert.java:53) at 
org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.testTruncateWithDataNodesShutdownImmediately(TestFileTruncate.java:927)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.lang.Thread.run(Thread.java:748) [ERROR] 
testTruncateWithDataNodesShutdownImmediately(org.apache.hadoop.hdfs.server.namenode.TestFileTruncate)
 Time elapsed: 3.868 s <<< FAILURE! java.lang.AssertionError at 
org.junit.Assert.fail(Assert.java:87) at 
org.junit.Assert.assertTrue(Assert.java:42) at 
org.junit.Assert.assertTrue(Assert.java:53) at 
org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.testTruncateWithDataNodesShutdownImmediately(TestFileTruncate.java:927)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.lang.Thread.run(Thread.java:748)

 

CI result is 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3296/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt

> TestFileTruncate#testTruncateWithDataNodesShutdownImmediately fails
> ---
>
> Key: HDFS-16170
> URL: https://issues.apache.org/jira/browse/HDFS-16170
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Hui Fei
>Priority: Major
>
> [ERROR] Tests run: 26, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 
> 263.442 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.TestFileTruncate [ERROR] 
> testTruncateWithDataNodesShutdownImmediately(org.apache.hadoop.hdfs.server.namenode.TestFileTruncate)
>  Time elapsed: 4.291 s <<< FAILURE! java.lang.AssertionError at 
> org.junit.Assert.fail(Assert.java:87) at 
> org.junit.Assert.assertTrue(Assert.java:42) at 
> org.junit.Assert.assertTrue(Assert.java:53) at 
> org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.testTruncateWithDataNodesShutdownImmediately(TestFileTruncate.java:927)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> 

[jira] [Created] (HDFS-16170) TestFileTruncate#testTruncateWithDataNodesShutdownImmediately fails

2021-08-12 Thread Hui Fei (Jira)
Hui Fei created HDFS-16170:
--

 Summary: 
TestFileTruncate#testTruncateWithDataNodesShutdownImmediately fails
 Key: HDFS-16170
 URL: https://issues.apache.org/jira/browse/HDFS-16170
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: 3.4.0
Reporter: Hui Fei






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16169) TestBlockTokenWithDFSStriped#testEnd2End fails

2021-08-12 Thread Hui Fei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Fei updated HDFS-16169:
---
Description: 
[ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 141.936 
s <<< FAILURE! - in 
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped 
[ERROR] 
testEnd2End(org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped)
 Time elapsed: 28.325 s <<< FAILURE! java.lang.AssertionError: expected:<9> but 
was:<10> at org.junit.Assert.fail(Assert.java:89) at 
org.junit.Assert.failNotEquals(Assert.java:835) at 
org.junit.Assert.assertEquals(Assert.java:647) at 
org.junit.Assert.assertEquals(Assert.java:633) at 
org.apache.hadoop.hdfs.StripedFileTestUtil.verifyLocatedStripedBlocks(StripedFileTestUtil.java:344)
 at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTestBalancerWithStripedFile(TestBalancer.java:1666)
 at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.integrationTestWithStripedFile(TestBalancer.java:1601)
 at 
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped.testEnd2End(TestBlockTokenWithDFSStriped.java:119)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.lang.Thread.run(Thread.java:748)

 

CI result is 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3296/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt

> TestBlockTokenWithDFSStriped#testEnd2End fails
> --
>
> Key: HDFS-16169
> URL: https://issues.apache.org/jira/browse/HDFS-16169
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Hui Fei
>Priority: Major
>
> [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 141.936 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped 
> [ERROR] 
> testEnd2End(org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped)
>  Time elapsed: 28.325 s <<< FAILURE! java.lang.AssertionError: expected:<9> 
> but was:<10> at org.junit.Assert.fail(Assert.java:89) at 
> org.junit.Assert.failNotEquals(Assert.java:835) at 
> org.junit.Assert.assertEquals(Assert.java:647) at 
> org.junit.Assert.assertEquals(Assert.java:633) at 
> org.apache.hadoop.hdfs.StripedFileTestUtil.verifyLocatedStripedBlocks(StripedFileTestUtil.java:344)
>  at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTestBalancerWithStripedFile(TestBalancer.java:1666)
>  at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.integrationTestWithStripedFile(TestBalancer.java:1601)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped.testEnd2End(TestBlockTokenWithDFSStriped.java:119)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
>  
> CI result is 
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3296/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HDFS-16169) TestBlockTokenWithDFSStriped#testEnd2End fails

2021-08-12 Thread Hui Fei (Jira)
Hui Fei created HDFS-16169:
--

 Summary: TestBlockTokenWithDFSStriped#testEnd2End fails
 Key: HDFS-16169
 URL: https://issues.apache.org/jira/browse/HDFS-16169
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: 3.4.0
Reporter: Hui Fei






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16168) TestHDFSFileSystemContract#testAppend fails

2021-08-12 Thread Hui Fei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Fei updated HDFS-16168:
---
Description: 
[ERROR] Tests run: 46, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
225.478 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestHDFSFileSystemContract 
[ERROR] testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time 
elapsed: 30.14 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test 
timed out after 3 milliseconds at java.lang.Thread.sleep(Native Method) at 
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1002) 
at 
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:938) 
at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:902) 
at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850) at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77)
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) 
at org.apache.hadoop.hdfs.AppendTestUtil.testAppend(AppendTestUtil.java:257) at 
org.apache.hadoop.hdfs.TestHDFSFileSystemContract.testAppend(TestHDFSFileSystemContract.java:68)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.lang.Thread.run(Thread.java:748) [ERROR] 
testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time elapsed: 
30.003 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed 
out after 3 milliseconds at java.lang.Object.wait(Native Method) at 
java.lang.Object.wait(Object.java:502) at 
org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:59) at 
org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) at 
org.apache.hadoop.ipc.Client.call(Client.java:1525) at 
org.apache.hadoop.ipc.Client.call(Client.java:1422) at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
 at com.sun.proxy.$Proxy25.append(Unknown Source) at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:415)
 at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431)
 at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
 at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
 at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
 at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
 at com.sun.proxy.$Proxy26.append(Unknown Source) at 
org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1385) at 
org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1407) at 
org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1476) at 
org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1446) at 
org.apache.hadoop.hdfs.DistributedFileSystem$5.doCall(DistributedFileSystem.java:450)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem$5.doCall(DistributedFileSystem.java:446)
 at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:458)
 at 
org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:427)
 at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1455) at 
org.apache.hadoop.hdfs.AppendTestUtil.testAppend(AppendTestUtil.java:255) at 
org.apache.hadoop.hdfs.TestHDFSFileSystemContract.testAppend(TestHDFSFileSystemContract.java:68)
 at 

[jira] [Created] (HDFS-16168) TestHDFSFileSystemContract#testAppend fails

2021-08-12 Thread Hui Fei (Jira)
Hui Fei created HDFS-16168:
--

 Summary: TestHDFSFileSystemContract#testAppend fails
 Key: HDFS-16168
 URL: https://issues.apache.org/jira/browse/HDFS-16168
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: 3.4.0
Reporter: Hui Fei






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16167) TestDFSInotifyEventInputStreamKerberized#testWithKerberizedCluster fails

2021-08-12 Thread Hui Fei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Fei updated HDFS-16167:
---
Description: 
[ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 35.878 
s <<< FAILURE! - in 
org.apache.hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized [ERROR] 
testWithKerberizedCluster(org.apache.hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized)
 Time elapsed: 20.957 s <<< ERROR! java.io.IOException: DestHost:destPort 
localhost:12652 , LocalHost:localPort e8a60ac68857/172.17.0.2:0. Failed on 
local exception: java.io.IOException: javax.security.sasl.SaslException: GSS 
initiate failed [Caused by GSSException: No valid credentials provided 
(Mechanism level: Failed to find any Kerberos tgt)] at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:914) at 
org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:889) at 
org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1583) at 
org.apache.hadoop.ipc.Client.call(Client.java:1525) at 
org.apache.hadoop.ipc.Client.call(Client.java:1422) at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
 at com.sun.proxy.$Proxy23.getEditsFromTxid(Unknown Source) at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolTranslatorPB.java:1881)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431)
 at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
 at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
 at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
 at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
 at com.sun.proxy.$Proxy24.getEditsFromTxid(Unknown Source) at 
org.apache.hadoop.hdfs.DFSInotifyEventInputStream.poll(DFSInotifyEventInputStream.java:105)
 at 
org.apache.hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized$1.run(TestDFSInotifyEventInputStreamKerberized.java:145)
 at 
org.apache.hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized$1.run(TestDFSInotifyEventInputStreamKerberized.java:116)
 at java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1900)
 at 
org.apache.hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized.testWithKerberizedCluster(TestDFSInotifyEventInputStreamKerberized.java:116)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: 
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)] at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:788) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1900)

[jira] [Created] (HDFS-16167) TestDFSInotifyEventInputStreamKerberized#testWithKerberizedCluster fails

2021-08-12 Thread Hui Fei (Jira)
Hui Fei created HDFS-16167:
--

 Summary: 
TestDFSInotifyEventInputStreamKerberized#testWithKerberizedCluster fails
 Key: HDFS-16167
 URL: https://issues.apache.org/jira/browse/HDFS-16167
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: 3.4.0
Reporter: Hui Fei






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16166) TestDecommissionWithBackoffMonitor#testDecommissionWithCloseFileAndListOpenFiles fails

2021-08-12 Thread Hui Fei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Fei updated HDFS-16166:
---
Description: 
[ERROR] 
testDecommissionWithCloseFileAndListOpenFiles(org.apache.hadoop.hdfs.TestDecommissionWithBackoffMonitor)
 Time elapsed: 360.695 s <<< ERROR! 
org.junit.runners.model.TestTimedOutException: test timed out after 36 
milliseconds at java.lang.Thread.sleep(Native Method) at 
org.apache.hadoop.hdfs.AdminStatesBaseTest.waitNodeState(AdminStatesBaseTest.java:346)
 at 
org.apache.hadoop.hdfs.AdminStatesBaseTest.waitNodeState(AdminStatesBaseTest.java:333)
 at 
org.apache.hadoop.hdfs.TestDecommission.testDecommissionWithCloseFileAndListOpenFiles(TestDecommission.java:912)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.lang.Thread.run(Thread.java:748)

 

CI result is 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3296/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt

> TestDecommissionWithBackoffMonitor#testDecommissionWithCloseFileAndListOpenFiles
>  fails
> --
>
> Key: HDFS-16166
> URL: https://issues.apache.org/jira/browse/HDFS-16166
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.4.0
>Reporter: Hui Fei
>Priority: Major
>
> [ERROR] 
> testDecommissionWithCloseFileAndListOpenFiles(org.apache.hadoop.hdfs.TestDecommissionWithBackoffMonitor)
>  Time elapsed: 360.695 s <<< ERROR! 
> org.junit.runners.model.TestTimedOutException: test timed out after 36 
> milliseconds at java.lang.Thread.sleep(Native Method) at 
> org.apache.hadoop.hdfs.AdminStatesBaseTest.waitNodeState(AdminStatesBaseTest.java:346)
>  at 
> org.apache.hadoop.hdfs.AdminStatesBaseTest.waitNodeState(AdminStatesBaseTest.java:333)
>  at 
> org.apache.hadoop.hdfs.TestDecommission.testDecommissionWithCloseFileAndListOpenFiles(TestDecommission.java:912)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
>  
> CI result is 
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3296/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16166) TestDecommissionWithBackoffMonitor#testDecommissionWithCloseFileAndListOpenFiles fails

2021-08-12 Thread Hui Fei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Fei updated HDFS-16166:
---
Parent: HDFS-15646
Issue Type: Sub-task  (was: Task)

> TestDecommissionWithBackoffMonitor#testDecommissionWithCloseFileAndListOpenFiles
>  fails
> --
>
> Key: HDFS-16166
> URL: https://issues.apache.org/jira/browse/HDFS-16166
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.4.0
>Reporter: Hui Fei
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16166) TestDecommissionWithBackoffMonitor#testDecommissionWithCloseFileAndListOpenFiles fails

2021-08-12 Thread Hui Fei (Jira)
Hui Fei created HDFS-16166:
--

 Summary: 
TestDecommissionWithBackoffMonitor#testDecommissionWithCloseFileAndListOpenFiles
 fails
 Key: HDFS-16166
 URL: https://issues.apache.org/jira/browse/HDFS-16166
 Project: Hadoop HDFS
  Issue Type: Task
Affects Versions: 3.4.0
Reporter: Hui Fei






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16141) [FGL] Address permission related issues with File / Directory

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16141?focusedWorklogId=637568=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637568
 ]

ASF GitHub Bot logged work on HDFS-16141:
-

Author: ASF GitHub Bot
Created on: 12/Aug/21 22:24
Start Date: 12/Aug/21 22:24
Worklog Time Spent: 10m 
  Work Description: prasad-acit commented on pull request #3232:
URL: https://github.com/apache/hadoop/pull/3232#issuecomment-898007634


   Thanks @shvachko for review & feedback. I have addressed the comments, can 
you please take a look?
   Failed test has nothing related to the version, i found the issue & handled. 
   Cause: Post restart, IBR & DN register threads also run with write global 
lock. When these threads release the lock, it has cascading effect on partition 
locks. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637568)
Time Spent: 1h 50m  (was: 1h 40m)

> [FGL] Address permission related issues with File / Directory
> -
>
> Key: HDFS-16141
> URL: https://issues.apache.org/jira/browse/HDFS-16141
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Renukaprasad C
>Assignee: Renukaprasad C
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Post FGL implementation (MKDIR & Create File), there are existing UTs got 
> impacted which needs to be addressed.
> Failed Tests:
> TestDFSPermission
> TestPermission
> TestFileCreation
> TestDFSMkdirs (Added tests)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-16165) Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x

2021-08-12 Thread Daniel Osvath (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Osvath updated HDFS-16165:
-
Comment: was deleted

(was: This request is on behalf of [Confluent, Inc|http://confluent.io/].)

> Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x
> --
>
> Key: HDFS-16165
> URL: https://issues.apache.org/jira/browse/HDFS-16165
> Project: Hadoop HDFS
>  Issue Type: Wish
> Environment: Can be reproduced in docker HDFS environment with 
> Kerberos 
> https://github.com/vdesabou/kafka-docker-playground/blob/93a93de293ad2f9bb22afb244f2d8729a178296e/connect/connect-hdfs2-sink/hdfs2-sink-ha-kerberos-repro-gss-exception.sh
>Reporter: Daniel Osvath
>Priority: Major
>  Labels: Confluent
>
> *Problem Description*
> For more than a year Apache Kafka Connect users have been running into a 
> Kerberos renewal issue that causes our HDFS2 connectors to fail. 
> We have been able to consistently reproduce the issue under high load with 40 
> connectors (threads) that use the library. When we try an alternate 
> workaround that uses the kerberos keytab on the system the connector operates 
> without issues.
> We identified the root cause to be a race condition bug in the Hadoop 2.x 
> library that causes the ticker renewal to fail with the error below: 
> {code:java}
> Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
>  at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)We
>  reached the conclusion of the root cause once we tried the same environment 
> (40 connectors) with Hadoop 3.x, and our HDFS3 connectors and operated 
> without renewal issues. Additionally, identifying that the synchronization 
> issue has been fixed for the newer Hadoop 3.x releases  we confirmed our 
> hypothesis about the root cause. Request
> {code}
> There are many changes in HDFS 3 
> [UserGroupInformation.java|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java]
>  related to UGI synchronization which were done as part of 
> https://issues.apache.org/jira/browse/HADOOP-9747, and those changes suggest 
> some race conditions were happening with older version, i.e HDFS 2.x Which 
> would explain why we can reproduce the problem with HDFS2.
> For example(among others):
> {code:java}
>   private void relogin(HadoopLoginContext login, boolean ignoreLastLoginTime)
>   throws IOException {
> // ensure the relogin is atomic to avoid leaving credentials in an
> // inconsistent state.  prevents other ugi instances, SASL, and SPNEGO
> // from accessing or altering credentials during the relogin.
> synchronized(login.getSubjectLock()) {
>   // another racing thread may have beat us to the relogin.
>   if (login == getLogin()) {
> unprotectedRelogin(login, ignoreLastLoginTime);
>   }
> }
>   }
> {code}
> All those changes were not backported to Hadoop 2.x (out HDFS2 connector uses 
> 2.10.1), on which several CDH distributions are based. 
> *Request*
> We would like to ask for the synchronization fix to be backported to Hadoop 
> 2.x so that our users can operate without issues. 
> *Impact*
> The older 2.x Hadoop version is used by our HDFS connector, which is used in 
> production by our community. Currently, the issue causes our HDFS connector 
> to fail, as it is unable to recover and renew the ticket at a later point. 
> Having the backported fix would allow our users to operate without issues 
> that require manual intervention every week (or few days in some cases). The 
> only workaround available to community for the issue is to run a command or 
> restart their workers. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16165) Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x

2021-08-12 Thread Daniel Osvath (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Osvath updated HDFS-16165:
-
Labels: Confluent  (was: )

> Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x
> --
>
> Key: HDFS-16165
> URL: https://issues.apache.org/jira/browse/HDFS-16165
> Project: Hadoop HDFS
>  Issue Type: Wish
> Environment: Can be reproduced in docker HDFS environment with 
> Kerberos 
> https://github.com/vdesabou/kafka-docker-playground/blob/93a93de293ad2f9bb22afb244f2d8729a178296e/connect/connect-hdfs2-sink/hdfs2-sink-ha-kerberos-repro-gss-exception.sh
>Reporter: Daniel Osvath
>Priority: Major
>  Labels: Confluent
>
> *Problem Description*
> For more than a year Apache Kafka Connect users have been running into a 
> Kerberos renewal issue that causes our HDFS2 connectors to fail. 
> We have been able to consistently reproduce the issue under high load with 40 
> connectors (threads) that use the library. When we try an alternate 
> workaround that uses the kerberos keytab on the system the connector operates 
> without issues.
> We identified the root cause to be a race condition bug in the Hadoop 2.x 
> library that causes the ticker renewal to fail with the error below: 
> {code:java}
> Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
>  at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)We
>  reached the conclusion of the root cause once we tried the same environment 
> (40 connectors) with Hadoop 3.x, and our HDFS3 connectors and operated 
> without renewal issues. Additionally, identifying that the synchronization 
> issue has been fixed for the newer Hadoop 3.x releases  we confirmed our 
> hypothesis about the root cause. Request
> {code}
> There are many changes in HDFS 3 
> [UserGroupInformation.java|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java]
>  related to UGI synchronization which were done as part of 
> https://issues.apache.org/jira/browse/HADOOP-9747, and those changes suggest 
> some race conditions were happening with older version, i.e HDFS 2.x Which 
> would explain why we can reproduce the problem with HDFS2.
> For example(among others):
> {code:java}
>   private void relogin(HadoopLoginContext login, boolean ignoreLastLoginTime)
>   throws IOException {
> // ensure the relogin is atomic to avoid leaving credentials in an
> // inconsistent state.  prevents other ugi instances, SASL, and SPNEGO
> // from accessing or altering credentials during the relogin.
> synchronized(login.getSubjectLock()) {
>   // another racing thread may have beat us to the relogin.
>   if (login == getLogin()) {
> unprotectedRelogin(login, ignoreLastLoginTime);
>   }
> }
>   }
> {code}
> All those changes were not backported to Hadoop 2.x (out HDFS2 connector uses 
> 2.10.1), on which several CDH distributions are based. 
> *Request*
> We would like to ask for the synchronization fix to be backported to Hadoop 
> 2.x so that our users can operate without issues. 
> *Impact*
> The older 2.x Hadoop version is used by our HDFS connector, which is used in 
> production by our community. Currently, the issue causes our HDFS connector 
> to fail, as it is unable to recover and renew the ticket at a later point. 
> Having the backported fix would allow our users to operate without issues 
> that require manual intervention every week (or few days in some cases). The 
> only workaround available to community for the issue is to run a command or 
> restart their workers. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-16165) Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x

2021-08-12 Thread Daniel Osvath (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17398208#comment-17398208
 ] 

Daniel Osvath edited comment on HDFS-16165 at 8/12/21, 7:31 PM:


This request is on behalf of [Confluent, Inc|http://confluent.io/].


was (Author: dosvath):
This request is on behalf [Confluent, Inc|http://confluent.io].

> Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x
> --
>
> Key: HDFS-16165
> URL: https://issues.apache.org/jira/browse/HDFS-16165
> Project: Hadoop HDFS
>  Issue Type: Wish
> Environment: Can be reproduced in docker HDFS environment with 
> Kerberos 
> https://github.com/vdesabou/kafka-docker-playground/blob/93a93de293ad2f9bb22afb244f2d8729a178296e/connect/connect-hdfs2-sink/hdfs2-sink-ha-kerberos-repro-gss-exception.sh
>Reporter: Daniel Osvath
>Priority: Major
>
> *Problem Description*
> For more than a year Apache Kafka Connect users have been running into a 
> Kerberos renewal issue that causes our HDFS2 connectors to fail. 
> We have been able to consistently reproduce the issue under high load with 40 
> connectors (threads) that use the library. When we try an alternate 
> workaround that uses the kerberos keytab on the system the connector operates 
> without issues.
> We identified the root cause to be a race condition bug in the Hadoop 2.x 
> library that causes the ticker renewal to fail with the error below: 
> {code:java}
> Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
>  at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)We
>  reached the conclusion of the root cause once we tried the same environment 
> (40 connectors) with Hadoop 3.x, and our HDFS3 connectors and operated 
> without renewal issues. Additionally, identifying that the synchronization 
> issue has been fixed for the newer Hadoop 3.x releases  we confirmed our 
> hypothesis about the root cause. Request
> {code}
> There are many changes in HDFS 3 
> [UserGroupInformation.java|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java]
>  related to UGI synchronization which were done as part of 
> https://issues.apache.org/jira/browse/HADOOP-9747, and those changes suggest 
> some race conditions were happening with older version, i.e HDFS 2.x Which 
> would explain why we can reproduce the problem with HDFS2.
> For example(among others):
> {code:java}
>   private void relogin(HadoopLoginContext login, boolean ignoreLastLoginTime)
>   throws IOException {
> // ensure the relogin is atomic to avoid leaving credentials in an
> // inconsistent state.  prevents other ugi instances, SASL, and SPNEGO
> // from accessing or altering credentials during the relogin.
> synchronized(login.getSubjectLock()) {
>   // another racing thread may have beat us to the relogin.
>   if (login == getLogin()) {
> unprotectedRelogin(login, ignoreLastLoginTime);
>   }
> }
>   }
> {code}
> All those changes were not backported to Hadoop 2.x (out HDFS2 connector uses 
> 2.10.1), on which several CDH distributions are based. 
> *Request*
> We would like to ask for the synchronization fix to be backported to Hadoop 
> 2.x so that our users can operate without issues. 
> *Impact*
> The older 2.x Hadoop version is used by our HDFS connector, which is used in 
> production by our community. Currently, the issue causes our HDFS connector 
> to fail, as it is unable to recover and renew the ticket at a later point. 
> Having the backported fix would allow our users to operate without issues 
> that require manual intervention every week (or few days in some cases). The 
> only workaround available to community for the issue is to run a command or 
> restart their workers. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16163) Avoid locking entire blockPinningFailures map

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16163?focusedWorklogId=637481=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637481
 ]

ASF GitHub Bot logged work on HDFS-16163:
-

Author: ASF GitHub Bot
Created on: 12/Aug/21 18:22
Start Date: 12/Aug/21 18:22
Worklog Time Spent: 10m 
  Work Description: virajjasani commented on pull request #3296:
URL: https://github.com/apache/hadoop/pull/3296#issuecomment-897868687


   @ferhui Does this sound good? Only single map entry (not multiple) is 
updated at a time by any thread and hence CHM is much better candidate rather 
than synchronizing entire map.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637481)
Time Spent: 40m  (was: 0.5h)

> Avoid locking entire blockPinningFailures map
> -
>
> Key: HDFS-16163
> URL: https://issues.apache.org/jira/browse/HDFS-16163
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In order for mover to exclude pinned blocks in subsequent iteration, we try 
> to put pinned blocks in a map of blockIds to set of Datanode sources. 
> However, while updating an entry of this map, we don't need to lock the 
> entire map. We can use fine-grained concurrency here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-16165) Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x

2021-08-12 Thread Daniel Osvath (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17398208#comment-17398208
 ] 

Daniel Osvath edited comment on HDFS-16165 at 8/12/21, 5:57 PM:


This request is on behalf [Confluent, Inc|http://confluent.io].


was (Author: dosvath):
This request is on behalf [Confluent, Inc|confluent.io]. 

> Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x
> --
>
> Key: HDFS-16165
> URL: https://issues.apache.org/jira/browse/HDFS-16165
> Project: Hadoop HDFS
>  Issue Type: Wish
> Environment: Can be reproduced in docker HDFS environment with 
> Kerberos 
> https://github.com/vdesabou/kafka-docker-playground/blob/93a93de293ad2f9bb22afb244f2d8729a178296e/connect/connect-hdfs2-sink/hdfs2-sink-ha-kerberos-repro-gss-exception.sh
>Reporter: Daniel Osvath
>Priority: Major
>
> *Problem Description*
> For more than a year Apache Kafka Connect users have been running into a 
> Kerberos renewal issue that causes our HDFS2 connectors to fail. 
> We have been able to consistently reproduce the issue under high load with 40 
> connectors (threads) that use the library. When we try an alternate 
> workaround that uses the kerberos keytab on the system the connector operates 
> without issues.
> We identified the root cause to be a race condition bug in the Hadoop 2.x 
> library that causes the ticker renewal to fail with the error below: 
> {code:java}
> Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
>  at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)We
>  reached the conclusion of the root cause once we tried the same environment 
> (40 connectors) with Hadoop 3.x, and our HDFS3 connectors and operated 
> without renewal issues. Additionally, identifying that the synchronization 
> issue has been fixed for the newer Hadoop 3.x releases  we confirmed our 
> hypothesis about the root cause. Request
> {code}
> There are many changes in HDFS 3 
> [UserGroupInformation.java|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java]
>  related to UGI synchronization which were done as part of 
> https://issues.apache.org/jira/browse/HADOOP-9747, and those changes suggest 
> some race conditions were happening with older version, i.e HDFS 2.x Which 
> would explain why we can reproduce the problem with HDFS2.
> For example(among others):
> {code:java}
>   private void relogin(HadoopLoginContext login, boolean ignoreLastLoginTime)
>   throws IOException {
> // ensure the relogin is atomic to avoid leaving credentials in an
> // inconsistent state.  prevents other ugi instances, SASL, and SPNEGO
> // from accessing or altering credentials during the relogin.
> synchronized(login.getSubjectLock()) {
>   // another racing thread may have beat us to the relogin.
>   if (login == getLogin()) {
> unprotectedRelogin(login, ignoreLastLoginTime);
>   }
> }
>   }
> {code}
> All those changes were not backported to Hadoop 2.x (out HDFS2 connector uses 
> 2.10.1), on which several CDH distributions are based. 
> *Request*
> We would like to ask for the synchronization fix to be backported to Hadoop 
> 2.x so that our users can operate without issues. 
> *Impact*
> The older 2.x Hadoop version is used by our HDFS connector, which is used in 
> production by our community. Currently, the issue causes our HDFS connector 
> to fail, as it is unable to recover and renew the ticket at a later point. 
> Having the backported fix would allow our users to operate without issues 
> that require manual intervention every week (or few days in some cases). The 
> only workaround available to community for the issue is to run a command or 
> restart their workers. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16165) Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x

2021-08-12 Thread Daniel Osvath (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17398208#comment-17398208
 ] 

Daniel Osvath commented on HDFS-16165:
--

This request is on behalf [Confluent, Inc|confluent.io]. 

> Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x
> --
>
> Key: HDFS-16165
> URL: https://issues.apache.org/jira/browse/HDFS-16165
> Project: Hadoop HDFS
>  Issue Type: Wish
> Environment: Can be reproduced in docker HDFS environment with 
> Kerberos 
> https://github.com/vdesabou/kafka-docker-playground/blob/93a93de293ad2f9bb22afb244f2d8729a178296e/connect/connect-hdfs2-sink/hdfs2-sink-ha-kerberos-repro-gss-exception.sh
>Reporter: Daniel Osvath
>Priority: Major
>
> *Problem Description*
> For more than a year Apache Kafka Connect users have been running into a 
> Kerberos renewal issue that causes our HDFS2 connectors to fail. 
> We have been able to consistently reproduce the issue under high load with 40 
> connectors (threads) that use the library. When we try an alternate 
> workaround that uses the kerberos keytab on the system the connector operates 
> without issues.
> We identified the root cause to be a race condition bug in the Hadoop 2.x 
> library that causes the ticker renewal to fail with the error below: 
> {code:java}
> Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
>  at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)We
>  reached the conclusion of the root cause once we tried the same environment 
> (40 connectors) with Hadoop 3.x, and our HDFS3 connectors and operated 
> without renewal issues. Additionally, identifying that the synchronization 
> issue has been fixed for the newer Hadoop 3.x releases  we confirmed our 
> hypothesis about the root cause. Request
> {code}
> There are many changes in HDFS 3 
> [UserGroupInformation.java|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java]
>  related to UGI synchronization which were done as part of 
> https://issues.apache.org/jira/browse/HADOOP-9747, and those changes suggest 
> some race conditions were happening with older version, i.e HDFS 2.x Which 
> would explain why we can reproduce the problem with HDFS2.
> For example(among others):
> {code:java}
>   private void relogin(HadoopLoginContext login, boolean ignoreLastLoginTime)
>   throws IOException {
> // ensure the relogin is atomic to avoid leaving credentials in an
> // inconsistent state.  prevents other ugi instances, SASL, and SPNEGO
> // from accessing or altering credentials during the relogin.
> synchronized(login.getSubjectLock()) {
>   // another racing thread may have beat us to the relogin.
>   if (login == getLogin()) {
> unprotectedRelogin(login, ignoreLastLoginTime);
>   }
> }
>   }
> {code}
> All those changes were not backported to Hadoop 2.x (out HDFS2 connector uses 
> 2.10.1), on which several CDH distributions are based. 
> *Request*
> We would like to ask for the synchronization fix to be backported to Hadoop 
> 2.x so that our users can operate without issues. 
> *Impact*
> The older 2.x Hadoop version is used by our HDFS connector, which is used in 
> production by our community. Currently, the issue causes our HDFS connector 
> to fail, as it is unable to recover and renew the ticket at a later point. 
> Having the backported fix would allow our users to operate without issues 
> that require manual intervention every week (or few days in some cases). The 
> only workaround available to community for the issue is to run a command or 
> restart their workers. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16165) Backport the Hadoop 3.x Kerberos synchronization fix to Hadoop 2.x

2021-08-12 Thread Daniel Osvath (Jira)
Daniel Osvath created HDFS-16165:


 Summary: Backport the Hadoop 3.x Kerberos synchronization fix to 
Hadoop 2.x
 Key: HDFS-16165
 URL: https://issues.apache.org/jira/browse/HDFS-16165
 Project: Hadoop HDFS
  Issue Type: Wish
 Environment: Can be reproduced in docker HDFS environment with 
Kerberos 
https://github.com/vdesabou/kafka-docker-playground/blob/93a93de293ad2f9bb22afb244f2d8729a178296e/connect/connect-hdfs2-sink/hdfs2-sink-ha-kerberos-repro-gss-exception.sh
Reporter: Daniel Osvath


*Problem Description*

For more than a year Apache Kafka Connect users have been running into a 
Kerberos renewal issue that causes our HDFS2 connectors to fail. 

We have been able to consistently reproduce the issue under high load with 40 
connectors (threads) that use the library. When we try an alternate workaround 
that uses the kerberos keytab on the system the connector operates without 
issues.

We identified the root cause to be a race condition bug in the Hadoop 2.x 
library that causes the ticker renewal to fail with the error below: 


{code:java}
Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by 
GSSException: No valid credentials provided (Mechanism level: Failed to find 
any Kerberos tgt)]
 at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)We
 reached the conclusion of the root cause once we tried the same environment 
(40 connectors) with Hadoop 3.x, and our HDFS3 connectors and operated without 
renewal issues. Additionally, identifying that the synchronization issue has 
been fixed for the newer Hadoop 3.x releases  we confirmed our hypothesis about 
the root cause. Request
{code}

There are many changes in HDFS 3 
[UserGroupInformation.java|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java]
 related to UGI synchronization which were done as part of 
https://issues.apache.org/jira/browse/HADOOP-9747, and those changes suggest 
some race conditions were happening with older version, i.e HDFS 2.x Which 
would explain why we can reproduce the problem with HDFS2.
For example(among others):
{code:java}
  private void relogin(HadoopLoginContext login, boolean ignoreLastLoginTime)
  throws IOException {
// ensure the relogin is atomic to avoid leaving credentials in an
// inconsistent state.  prevents other ugi instances, SASL, and SPNEGO
// from accessing or altering credentials during the relogin.
synchronized(login.getSubjectLock()) {
  // another racing thread may have beat us to the relogin.
  if (login == getLogin()) {
unprotectedRelogin(login, ignoreLastLoginTime);
  }
}
  }
{code}
All those changes were not backported to Hadoop 2.x (out HDFS2 connector uses 
2.10.1), on which several CDH distributions are based. 

*Request*
We would like to ask for the synchronization fix to be backported to Hadoop 2.x 
so that our users can operate without issues. 

*Impact*
The older 2.x Hadoop version is used by our HDFS connector, which is used in 
production by our community. Currently, the issue causes our HDFS connector to 
fail, as it is unable to recover and renew the ticket at a later point. Having 
the backported fix would allow our users to operate without issues that require 
manual intervention every week (or few days in some cases). The only workaround 
available to community for the issue is to run a command or restart their 
workers. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12188) TestDecommissioningStatus#testDecommissionStatus fails intermittently

2021-08-12 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17398191#comment-17398191
 ] 

Ahmed Hussein commented on HDFS-12188:
--

Thanks [~vjasani]! I am looking forward to seeing the new jira.
Just a brief description of how the refactored code brings more stability to 
the unit test would be good enough.

> TestDecommissioningStatus#testDecommissionStatus fails intermittently
> -
>
> Key: HDFS-12188
> URL: https://issues.apache.org/jira/browse/HDFS-12188
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Ajay Kumar
>Priority: Major
>  Labels: pull-request-available
> Attachments: TestFailure_Log.txt
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {noformat}
> java.lang.AssertionError: Unexpected num under-replicated blocks expected:<3> 
> but was:<4>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:144)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:240)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16157) Support configuring DNS record to get list of journal nodes.

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16157?focusedWorklogId=637400=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637400
 ]

ASF GitHub Bot logged work on HDFS-16157:
-

Author: ASF GitHub Bot
Created on: 12/Aug/21 16:07
Start Date: 12/Aug/21 16:07
Worklog Time Spent: 10m 
  Work Description: fengnanli commented on pull request #3284:
URL: https://github.com/apache/hadoop/pull/3284#issuecomment-897765432


   LGTM. Let's wait for some time if others have some comments. @Hexiaoqiao 
FYI. We are doing this in a series to help reducing the dependency on a single 
host name.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637400)
Time Spent: 20m  (was: 10m)

> Support configuring DNS record to get list of journal nodes.
> 
>
> Key: HDFS-16157
> URL: https://issues.apache.org/jira/browse/HDFS-16157
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We can use a DNS round-robin record to configure list of journal nodes, so we 
> don't have to reconfigure everything journal node hostname is changed. For 
> example, in some containerized environment the hostname of journal nodes can 
> change pretty often.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12188) TestDecommissioningStatus#testDecommissionStatus fails intermittently

2021-08-12 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17398102#comment-17398102
 ] 

Viraj Jasani edited comment on HDFS-12188 at 8/12/21, 2:43 PM:
---

Sure [~ahussein], I think it makes sense to not loose the history. Although the 
Exception stacktrace is exactly matching but this Jira was filed long back and 
the same test might likely have had different issue. Let me detach PR from this 
one and create new Jira. Thanks.

Sorry for the noise everyone. Let me get back to this Jira once new Jira is 
filed, has more test results to rely on and then we can decide on whether we 
would like to attach this Jira to new one and close this.


was (Author: vjasani):
Sure [~ahussein], I think it makes sense to not loose the history. Although the 
Exception stacktrace is exactly matching but this Jira was filed long back and 
the same test might likely have had different issue. Let me detach PR from this 
one and create new Jira. Thanks.

> TestDecommissioningStatus#testDecommissionStatus fails intermittently
> -
>
> Key: HDFS-12188
> URL: https://issues.apache.org/jira/browse/HDFS-12188
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Ajay Kumar
>Priority: Major
>  Labels: pull-request-available
> Attachments: TestFailure_Log.txt
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {noformat}
> java.lang.AssertionError: Unexpected num under-replicated blocks expected:<3> 
> but was:<4>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:144)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:240)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12188) TestDecommissioningStatus#testDecommissionStatus fails intermittently

2021-08-12 Thread Viraj Jasani (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani updated HDFS-12188:

Summary: TestDecommissioningStatus#testDecommissionStatus fails 
intermittently  (was: De-flake testDecommissionStatus)

> TestDecommissioningStatus#testDecommissionStatus fails intermittently
> -
>
> Key: HDFS-12188
> URL: https://issues.apache.org/jira/browse/HDFS-12188
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Ajay Kumar
>Priority: Major
>  Labels: pull-request-available
> Attachments: TestFailure_Log.txt
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {noformat}
> java.lang.AssertionError: Unexpected num under-replicated blocks expected:<3> 
> but was:<4>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:144)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:240)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12188) De-flake testDecommissionStatus

2021-08-12 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17398102#comment-17398102
 ] 

Viraj Jasani commented on HDFS-12188:
-

Sure [~ahussein], I think it makes sense to not loose the history. Although the 
Exception stacktrace is exactly matching but this Jira was filed long back and 
the same test might likely have had different issue. Let me detach PR from this 
one and create new Jira. Thanks.

> De-flake testDecommissionStatus
> ---
>
> Key: HDFS-12188
> URL: https://issues.apache.org/jira/browse/HDFS-12188
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Ajay Kumar
>Priority: Major
>  Labels: pull-request-available
> Attachments: TestFailure_Log.txt
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {noformat}
> java.lang.AssertionError: Unexpected num under-replicated blocks expected:<3> 
> but was:<4>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:144)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:240)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12188) De-flake testDecommissionStatus

2021-08-12 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17398097#comment-17398097
 ] 

Ahmed Hussein commented on HDFS-12188:
--

Hi [~vjasani]
Thanks for taking a look at this jira. I have few comments:
* can you please describe what the purpose of the change to resolve the issue?
* if it is not evident that the changes are resolving the original issue this 
ticket was filed for, then it would be better to open a new jira , then later 
this very jira can be resolved. You can also link this jira to the new one. In 
That way,  anyone who was aware of the problem would be able to understand that 
it has been resolved.
* Changing title/description for an old jira is not very good idea (unless 
there was error or typo) because it will be difficult for other developers to 
see that what they have filed/contributed to has been resolved. They need to 
guess then look inside the transitions/history of the jiras to find what they 
have been looking for.
* For future jiras, I personally prefer that a new jira is filed leaving the 
existing ones as they are (until they are marked as resolved).

> De-flake testDecommissionStatus
> ---
>
> Key: HDFS-12188
> URL: https://issues.apache.org/jira/browse/HDFS-12188
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Ajay Kumar
>Priority: Major
>  Labels: pull-request-available
> Attachments: TestFailure_Log.txt
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {noformat}
> java.lang.AssertionError: Unexpected num under-replicated blocks expected:<3> 
> but was:<4>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:144)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:240)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16162) Improve DFSUtil#checkProtectedDescendants() related parameter comments

2021-08-12 Thread JiangHua Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JiangHua Zhu updated HDFS-16162:

Component/s: documentation

> Improve DFSUtil#checkProtectedDescendants() related parameter comments
> --
>
> Key: HDFS-16162
> URL: https://issues.apache.org/jira/browse/HDFS-16162
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Some parameter comments related to DFSUtil#checkProtectedDescendants() are 
> missing, for example:
> /**
>  * If the given directory has any non-empty protected descendants, throw
>  * (Including itself).
>  *
>  * @param iip directory, to check its descendants.
>  * @throws AccessControlException if it is a non-empty protected 
> descendant
>  *found.
>  * @throws ParentNotDirectoryException
>  * @throws UnresolvedLinkException
>  */
> public static void checkProtectedDescendants(
> FSDirectory fsd, INodesInPath iip)
> Throw AccessControlException, UnresolvedLinkException,
> ParentNotDirectoryException {
> The description of fsd is missing here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-12188) De-flake testDecommissionStatus

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12188?focusedWorklogId=637274=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637274
 ]

ASF GitHub Bot logged work on HDFS-12188:
-

Author: ASF GitHub Bot
Created on: 12/Aug/21 07:39
Start Date: 12/Aug/21 07:39
Worklog Time Spent: 10m 
  Work Description: virajjasani commented on pull request #3280:
URL: https://github.com/apache/hadoop/pull/3280#issuecomment-897418170


   Thanks @ferhui for the review.
   
   > This PR tile is different from HDFS-12188
   
   Updated Jira title because testDecommissionStatus test is present in both 
`TestDecommissioningStatus` and `TestDecommissioningStatusWithBackoffMonitor`, 
hence by just mentioning testDecommissionStatus, we are taking care of both 
tests failures.
   
   > Do you explain why test is flaky and how you fix it?
   
   The no of under-replicated blocks on Datanode2 can either be 3 or 4 
depending on actual blocks available in Datanode Storage. This is the only 
reason behind flakiness, hence our logic should be check for count being 3/4. 
If under replicated blocks are anything other than 3 or 4, then this test has 
some other genuine failure case.
   
   > I see you add synchronized to some functions, Does it help to fix flaky 
problems?
   
   Good point, it doesn't solve flaky problem as of now. I just kept it while 
running 2 tests in parallel so that config setup is synchronized but now it is 
not required. I will remove it. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637274)
Time Spent: 40m  (was: 0.5h)

> De-flake testDecommissionStatus
> ---
>
> Key: HDFS-12188
> URL: https://issues.apache.org/jira/browse/HDFS-12188
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Ajay Kumar
>Priority: Major
>  Labels: pull-request-available
> Attachments: TestFailure_Log.txt
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {noformat}
> java.lang.AssertionError: Unexpected num under-replicated blocks expected:<3> 
> but was:<4>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:144)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:240)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12188) De-flake testDecommissionStatus

2021-08-12 Thread Viraj Jasani (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani updated HDFS-12188:

Summary: De-flake testDecommissionStatus  (was: 
TestDecommissioningStatus#testDecommissionStatus fails intermittently)

> De-flake testDecommissionStatus
> ---
>
> Key: HDFS-12188
> URL: https://issues.apache.org/jira/browse/HDFS-12188
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Ajay Kumar
>Priority: Major
>  Labels: pull-request-available
> Attachments: TestFailure_Log.txt
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {noformat}
> java.lang.AssertionError: Unexpected num under-replicated blocks expected:<3> 
> but was:<4>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:144)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.testDecommissionStatus(TestDecommissioningStatus.java:240)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16163) Avoid locking entire blockPinningFailures map

2021-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16163?focusedWorklogId=637254=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-637254
 ]

ASF GitHub Bot logged work on HDFS-16163:
-

Author: ASF GitHub Bot
Created on: 12/Aug/21 06:28
Start Date: 12/Aug/21 06:28
Worklog Time Spent: 10m 
  Work Description: virajjasani commented on pull request #3296:
URL: https://github.com/apache/hadoop/pull/3296#issuecomment-897382966


   Thanks for taking look @ferhui. Yes this is perf optimization however I just 
came across this while going through an non-relevant issue in mover because of 
which I was comparing all the diff b/ Hadoop 2.10 and latest 3.3 release. That 
original issue is still under investigation but while looking into all 
differences, I came across HDFS-11164 and realized that just to update/add one 
single key->value pair, we are locking entire map and hence I thought of fixing 
this. I just tested this locally for it's sanity and correctness but 
unfortunately I don't have perf results because it was simple test.
   
   The other way to look into this is with simplicity: Unless we are updating 
multiple entries in single batch, we don't need to lock entire map and for 
single entry update, we can rather use fine-grained ConcurrentHashMap provided 
utilities.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 637254)
Time Spent: 0.5h  (was: 20m)

> Avoid locking entire blockPinningFailures map
> -
>
> Key: HDFS-16163
> URL: https://issues.apache.org/jira/browse/HDFS-16163
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In order for mover to exclude pinned blocks in subsequent iteration, we try 
> to put pinned blocks in a map of blockIds to set of Datanode sources. 
> However, while updating an entry of this map, we don't need to lock the 
> entire map. We can use fine-grained concurrency here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org