[ https://issues.apache.org/jira/browse/HDFS-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16005659#comment-16005659 ]
Arpit Agarwal commented on HDFS-11674: -------------------------------------- I ran this test 5 times and it timed out once waiting for the file to be closed. I didn't debug it further though. {code} "Thread-254" prio=5 tid=465 runnable java.lang.Thread.State: RUNNABLE at java.lang.Thread.dumpThreads(Native Method) at java.lang.Thread.getAllStackTraces(Thread.java:1607) at org.apache.hadoop.test.TimedOutTestsListener.buildThreadDump(TimedOutTestsListener.java:87) at org.apache.hadoop.test.TimedOutTestsListener.buildThreadDiagnosticString(TimedOutTestsListener.java:73) at org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:277) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestSpaceReservation.testReservedSpaceForLeaseRecovery(TestSpaceReservation.java:730) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) "pool-56-thread-1" daemon prio=5 tid=596 timed_waiting {code} > reserveSpaceForReplicas is not released if append request failed due to > mirror down and replica recovered > --------------------------------------------------------------------------------------------------------- > > Key: HDFS-11674 > URL: https://issues.apache.org/jira/browse/HDFS-11674 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Reporter: Vinayakumar B > Assignee: Vinayakumar B > Priority: Critical > Labels: release-blocker > Attachments: HDFS-11674-01.patch, HDFS-11674-02.patch > > > Scenario: > 1. 3 Node cluster with > "dfs.client.block.write.replace-datanode-on-failure.policy" as DEFAULT > Block is written with x data. > 2. One of the Datanode, NOT the first DN, is down > 3. Client tries to append data to block and fails since one DN is down. > 4. calls recoverLease() on the file. > 5. Successfull recovery happens. > Issue: > 1. DNs which were connected from client before encountering mirror down, will > have the reservedSpaceForReplicas incremented, BUT never decremented. > 2. So in long run DN's all space will be in reservedSpaceForReplicas > resulting OutOfSpace errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org