[jira] [Updated] (HDFS-17105) mistakenly purge editLogs even after it is empty in NNStorageRetentionManager
[ https://issues.apache.org/jira/browse/HDFS-17105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-17105: --- Priority: Minor (was: Critical) > mistakenly purge editLogs even after it is empty in NNStorageRetentionManager > -- > > Key: HDFS-17105 > URL: https://issues.apache.org/jira/browse/HDFS-17105 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ConfX >Assignee: ConfX >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: reproduce.sh > > > h2. What happened: > Got {{IndexOutOfBoundsException}} after setting > {{dfs.namenode.max.extra.edits.segments.retained}} to a negative value and > purging old record with {{{}NNStorageRetentionManager{}}}. > h2. Where's the bug: > In line 156 of {{{}NNStorageRetentionManager{}}}, the manager trims > {{editLogs}} until it is under the {{{}maxExtraEditsSegmentsToRetain{}}}: > {noformat} > while (editLogs.size() > maxExtraEditsSegmentsToRetain) { > purgeLogsFrom = editLogs.get(0).getLastTxId() + 1; > editLogs.remove(0); > }{noformat} > However, if {{dfs.namenode.max.extra.edits.segments.retained}} is set to > below 0 the size of {{editLogs}} would never be below, resulting in > ultimately {{editLog.size()=0}} and thus {{editLogs.get(0)}} is out of range. > h2. How to reproduce: > (1) Set {{dfs.namenode.max.extra.edits.segments.retained}} to -1974676133 > (2) Run test: > {{org.apache.hadoop.hdfs.server.namenode.TestNNStorageRetentionManager#testNoLogs}} > h2. Stacktrace: > {noformat} > java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0 > at > java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64) > at > java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70) > at > java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248) > at java.base/java.util.Objects.checkIndex(Objects.java:372) > at java.base/java.util.ArrayList.get(ArrayList.java:459) > at > org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:157) > at > org.apache.hadoop.hdfs.server.namenode.TestNNStorageRetentionManager.runTest(TestNNStorageRetentionManager.java:299) > at > org.apache.hadoop.hdfs.server.namenode.TestNNStorageRetentionManager.testNoLogs(TestNNStorageRetentionManager.java:143){noformat} > For an easy reproduction, run the reproduce.sh in the attachment. > We are happy to provide a patch if this issue is confirmed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17105) mistakenly purge editLogs even after it is empty in NNStorageRetentionManager
[ https://issues.apache.org/jira/browse/HDFS-17105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He resolved HDFS-17105. Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Assignee: ConfX Resolution: Fixed > mistakenly purge editLogs even after it is empty in NNStorageRetentionManager > -- > > Key: HDFS-17105 > URL: https://issues.apache.org/jira/browse/HDFS-17105 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ConfX >Assignee: ConfX >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: reproduce.sh > > > h2. What happened: > Got {{IndexOutOfBoundsException}} after setting > {{dfs.namenode.max.extra.edits.segments.retained}} to a negative value and > purging old record with {{{}NNStorageRetentionManager{}}}. > h2. Where's the bug: > In line 156 of {{{}NNStorageRetentionManager{}}}, the manager trims > {{editLogs}} until it is under the {{{}maxExtraEditsSegmentsToRetain{}}}: > {noformat} > while (editLogs.size() > maxExtraEditsSegmentsToRetain) { > purgeLogsFrom = editLogs.get(0).getLastTxId() + 1; > editLogs.remove(0); > }{noformat} > However, if {{dfs.namenode.max.extra.edits.segments.retained}} is set to > below 0 the size of {{editLogs}} would never be below, resulting in > ultimately {{editLog.size()=0}} and thus {{editLogs.get(0)}} is out of range. > h2. How to reproduce: > (1) Set {{dfs.namenode.max.extra.edits.segments.retained}} to -1974676133 > (2) Run test: > {{org.apache.hadoop.hdfs.server.namenode.TestNNStorageRetentionManager#testNoLogs}} > h2. Stacktrace: > {noformat} > java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0 > at > java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64) > at > java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70) > at > java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248) > at java.base/java.util.Objects.checkIndex(Objects.java:372) > at java.base/java.util.ArrayList.get(ArrayList.java:459) > at > org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:157) > at > org.apache.hadoop.hdfs.server.namenode.TestNNStorageRetentionManager.runTest(TestNNStorageRetentionManager.java:299) > at > org.apache.hadoop.hdfs.server.namenode.TestNNStorageRetentionManager.testNoLogs(TestNNStorageRetentionManager.java:143){noformat} > For an easy reproduction, run the reproduce.sh in the attachment. > We are happy to provide a patch if this issue is confirmed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16938) Utility to trigger heartbeat and wait until BP thread queue is fully processed
[ https://issues.apache.org/jira/browse/HDFS-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Jasani reassigned HDFS-16938: --- Assignee: (was: Viraj Jasani) > Utility to trigger heartbeat and wait until BP thread queue is fully processed > -- > > Key: HDFS-16938 > URL: https://issues.apache.org/jira/browse/HDFS-16938 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Priority: Major > Labels: pull-request-available > > As a follow-up to HDFS-16935, we should provide utility to trigger heartbeat > and wait until BP thread queue is fully processed. This would ensure 100% > consistency w.r.t active namenode being able to receive bad block reports > from the given datanode. This utility would resolve flakes for the tests that > rely on namenode's awareness of the reported bad blocks by datanodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17202) TestDFSAdmin.testAllDatanodesReconfig assertion failing (again)
Steve Loughran created HDFS-17202: - Summary: TestDFSAdmin.testAllDatanodesReconfig assertion failing (again) Key: HDFS-17202 URL: https://issues.apache.org/jira/browse/HDFS-17202 Project: Hadoop HDFS Issue Type: Bug Components: dfsadmin, test Affects Versions: 3.3.9 Reporter: Steve Loughran surfacing in the test run for HADOOP-18895 pr https://github.com/apache/hadoop/pull/6073 ``` Expecting: <["started at Thu Sep 14 23:14:07 GMT 2023SUCCESS: Changed property dfs.datanode.peer.stats.enabled", " From: "false"", " To: "true"", " and finished at Thu Sep 14 23:14:07 GMT 2023."]> to contain subsequence: <["SUCCESS: Changed property dfs.datanode.peer.stats.enabled", " From: "false"", " To: "true""]> ``` looks like some logging race condition again as the "started at" output is on the same line as "SUCCESS". maybe something needs to add a \n after the started message, or before SUCCESS> -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17201) some methods in FsDatasetImpl should acquire readLock with LockLevel.VOLUME
[ https://issues.apache.org/jira/browse/HDFS-17201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766734#comment-17766734 ] farmmamba commented on HDFS-17201: -- [~hexiaoqiao] [~tomscut] Hi, sir. Sorry for disturbing you here. I have a question here and needs your help. Firstly, take method FsDatasetImpl#contains as example, this method only acquire BLOCK_POOl read lock, but there exists logic that get ReplicaInfo from volumeMap. Some other methods also have such usages. Secondly, my question is why we just use BLOCK_POOl read lock here rather than VOLUME read lock ? Do we better change those BLOCK_POOl read locks to VOLUME read locks? Look forward to your reply. Thanks a lot. > some methods in FsDatasetImpl should acquire readLock with LockLevel.VOLUME > --- > > Key: HDFS-17201 > URL: https://issues.apache.org/jira/browse/HDFS-17201 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17201) some methods in FsDatasetImpl should acquire readLock with LockLevel.VOLUME
[ https://issues.apache.org/jira/browse/HDFS-17201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] farmmamba reassigned HDFS-17201: Assignee: farmmamba > some methods in FsDatasetImpl should acquire readLock with LockLevel.VOLUME > --- > > Key: HDFS-17201 > URL: https://issues.apache.org/jira/browse/HDFS-17201 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17201) some methods in FsDatasetImpl should acquire readLock with LockLevel.VOLUME
farmmamba created HDFS-17201: Summary: some methods in FsDatasetImpl should acquire readLock with LockLevel.VOLUME Key: HDFS-17201 URL: https://issues.apache.org/jira/browse/HDFS-17201 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: farmmamba Fix For: 3.4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org