[jira] [Created] (HDFS-17201) some methods in FsDatasetImpl should acquire readLock with LockLevel.VOLUME

2023-09-19 Thread farmmamba (Jira)
farmmamba created HDFS-17201:


 Summary: some methods in FsDatasetImpl should acquire readLock 
with LockLevel.VOLUME
 Key: HDFS-17201
 URL: https://issues.apache.org/jira/browse/HDFS-17201
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: farmmamba
 Fix For: 3.4.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17201) some methods in FsDatasetImpl should acquire readLock with LockLevel.VOLUME

2023-09-19 Thread farmmamba (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

farmmamba reassigned HDFS-17201:


Assignee: farmmamba

> some methods in FsDatasetImpl should acquire readLock with LockLevel.VOLUME
> ---
>
> Key: HDFS-17201
> URL: https://issues.apache.org/jira/browse/HDFS-17201
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17201) some methods in FsDatasetImpl should acquire readLock with LockLevel.VOLUME

2023-09-19 Thread farmmamba (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766734#comment-17766734
 ] 

farmmamba commented on HDFS-17201:
--

[~hexiaoqiao]  [~tomscut] Hi, sir. Sorry for disturbing you here. I have a 
question here and needs your help. 

Firstly, take method FsDatasetImpl#contains as example, this method only 
acquire BLOCK_POOl read lock, but there exists logic that get ReplicaInfo from 
volumeMap.

Some other methods also have such usages.

Secondly, my question is why we just use BLOCK_POOl read lock here rather than 
VOLUME read lock ? Do we better change those BLOCK_POOl read locks to VOLUME 
read locks?

Look forward to your reply. Thanks a lot.

> some methods in FsDatasetImpl should acquire readLock with LockLevel.VOLUME
> ---
>
> Key: HDFS-17201
> URL: https://issues.apache.org/jira/browse/HDFS-17201
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17202) TestDFSAdmin.testAllDatanodesReconfig assertion failing (again)

2023-09-19 Thread Steve Loughran (Jira)
Steve Loughran created HDFS-17202:
-

 Summary: TestDFSAdmin.testAllDatanodesReconfig assertion failing 
(again)
 Key: HDFS-17202
 URL: https://issues.apache.org/jira/browse/HDFS-17202
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: dfsadmin, test
Affects Versions: 3.3.9
Reporter: Steve Loughran


surfacing in the test run for HADOOP-18895 pr 
https://github.com/apache/hadoop/pull/6073

```
Expecting:
 <["started at Thu Sep 14 23:14:07 GMT 2023SUCCESS: Changed property 
dfs.datanode.peer.stats.enabled",
" From: "false"",
" To: "true"",
" and finished at Thu Sep 14 23:14:07 GMT 2023."]>
to contain subsequence:
 <["SUCCESS: Changed property dfs.datanode.peer.stats.enabled",
" From: "false"",
" To: "true""]>
```
looks like some logging race condition again as the "started at" output is on 
the same line as "SUCCESS". maybe something needs to add a \n after the started 
message, or before SUCCESS>



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-16938) Utility to trigger heartbeat and wait until BP thread queue is fully processed

2023-09-19 Thread Viraj Jasani (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani reassigned HDFS-16938:
---

Assignee: (was: Viraj Jasani)

> Utility to trigger heartbeat and wait until BP thread queue is fully processed
> --
>
> Key: HDFS-16938
> URL: https://issues.apache.org/jira/browse/HDFS-16938
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> As a follow-up to HDFS-16935, we should provide utility to trigger heartbeat 
> and wait until BP thread queue is fully processed. This would ensure 100% 
> consistency w.r.t active namenode being able to receive bad block reports 
> from the given datanode. This utility would resolve flakes for the tests that 
> rely on namenode's awareness of the reported bad blocks by datanodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17105) mistakenly purge editLogs even after it is empty in NNStorageRetentionManager

2023-09-19 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He resolved HDFS-17105.

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
 Assignee: ConfX
   Resolution: Fixed

>  mistakenly purge editLogs even after it is empty in NNStorageRetentionManager
> --
>
> Key: HDFS-17105
> URL: https://issues.apache.org/jira/browse/HDFS-17105
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ConfX
>Assignee: ConfX
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: reproduce.sh
>
>
> h2. What happened:
> Got {{IndexOutOfBoundsException}} after setting 
> {{dfs.namenode.max.extra.edits.segments.retained}} to a negative value and 
> purging old record with {{{}NNStorageRetentionManager{}}}.
> h2. Where's the bug:
> In line 156 of {{{}NNStorageRetentionManager{}}}, the manager trims 
> {{editLogs}} until it is under the {{{}maxExtraEditsSegmentsToRetain{}}}:
> {noformat}
> while (editLogs.size() > maxExtraEditsSegmentsToRetain) {
>       purgeLogsFrom = editLogs.get(0).getLastTxId() + 1;
>       editLogs.remove(0);
> }{noformat}
> However, if {{dfs.namenode.max.extra.edits.segments.retained}} is set to 
> below 0 the size of {{editLogs}} would never be below, resulting in 
> ultimately {{editLog.size()=0}} and thus {{editLogs.get(0)}} is out of range.
> h2. How to reproduce:
> (1) Set {{dfs.namenode.max.extra.edits.segments.retained}} to -1974676133
> (2) Run test: 
> {{org.apache.hadoop.hdfs.server.namenode.TestNNStorageRetentionManager#testNoLogs}}
> h2. Stacktrace:
> {noformat}
> java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0
>     at 
> java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
>     at 
> java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
>     at 
> java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248)
>     at java.base/java.util.Objects.checkIndex(Objects.java:372)
>     at java.base/java.util.ArrayList.get(ArrayList.java:459)
>     at 
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:157)
>     at 
> org.apache.hadoop.hdfs.server.namenode.TestNNStorageRetentionManager.runTest(TestNNStorageRetentionManager.java:299)
>     at 
> org.apache.hadoop.hdfs.server.namenode.TestNNStorageRetentionManager.testNoLogs(TestNNStorageRetentionManager.java:143){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17105) mistakenly purge editLogs even after it is empty in NNStorageRetentionManager

2023-09-19 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-17105:
---
Priority: Minor  (was: Critical)

>  mistakenly purge editLogs even after it is empty in NNStorageRetentionManager
> --
>
> Key: HDFS-17105
> URL: https://issues.apache.org/jira/browse/HDFS-17105
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ConfX
>Assignee: ConfX
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: reproduce.sh
>
>
> h2. What happened:
> Got {{IndexOutOfBoundsException}} after setting 
> {{dfs.namenode.max.extra.edits.segments.retained}} to a negative value and 
> purging old record with {{{}NNStorageRetentionManager{}}}.
> h2. Where's the bug:
> In line 156 of {{{}NNStorageRetentionManager{}}}, the manager trims 
> {{editLogs}} until it is under the {{{}maxExtraEditsSegmentsToRetain{}}}:
> {noformat}
> while (editLogs.size() > maxExtraEditsSegmentsToRetain) {
>       purgeLogsFrom = editLogs.get(0).getLastTxId() + 1;
>       editLogs.remove(0);
> }{noformat}
> However, if {{dfs.namenode.max.extra.edits.segments.retained}} is set to 
> below 0 the size of {{editLogs}} would never be below, resulting in 
> ultimately {{editLog.size()=0}} and thus {{editLogs.get(0)}} is out of range.
> h2. How to reproduce:
> (1) Set {{dfs.namenode.max.extra.edits.segments.retained}} to -1974676133
> (2) Run test: 
> {{org.apache.hadoop.hdfs.server.namenode.TestNNStorageRetentionManager#testNoLogs}}
> h2. Stacktrace:
> {noformat}
> java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0
>     at 
> java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
>     at 
> java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
>     at 
> java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248)
>     at java.base/java.util.Objects.checkIndex(Objects.java:372)
>     at java.base/java.util.ArrayList.get(ArrayList.java:459)
>     at 
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:157)
>     at 
> org.apache.hadoop.hdfs.server.namenode.TestNNStorageRetentionManager.runTest(TestNNStorageRetentionManager.java:299)
>     at 
> org.apache.hadoop.hdfs.server.namenode.TestNNStorageRetentionManager.testNoLogs(TestNNStorageRetentionManager.java:143){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org