[jira] [Commented] (HDFS-17432) Fix junit dependency to enable JUnit4 tests to run in hadoop-hdfs-rbf

2024-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828177#comment-17828177
 ] 

ASF GitHub Bot commented on HDFS-17432:
---

tasanuma commented on PR #6639:
URL: https://github.com/apache/hadoop/pull/6639#issuecomment-2005768324

   @dineshchitlangia Thanks for your review. But the failed tests may be 
related. I will investigate it.




> Fix junit dependency to enable JUnit4 tests to run in hadoop-hdfs-rbf
> -
>
> Key: HDFS-17432
> URL: https://issues.apache.org/jira/browse/HDFS-17432
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
>
> After HDFS-17370, JUnit4 tests stopped running in hadoop-hdfs-rbf. To enable 
> both JUnit4 and JUnit5 tests to run, we need to add junit-vintage-engine to 
> the hadoop-hdfs-rbf/pom.xml.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17432) Fix junit dependency to enable JUnit4 tests to run in hadoop-hdfs-rbf

2024-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828168#comment-17828168
 ] 

ASF GitHub Bot commented on HDFS-17432:
---

hadoop-yetus commented on PR #6639:
URL: https://github.com/apache/hadoop/pull/6639#issuecomment-2005732598

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   7m 35s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 34s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 24s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 24s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  mvnsite  |   0m 28s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 20s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  shadedclient  |  54m 19s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 19s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 19s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 18s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   0m 18s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  1s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  mvnsite  |   0m 21s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 17s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  shadedclient  |  20m 43s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  25m 56s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6639/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 26s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 113m 42s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.federation.router.TestRouterRpcMultiDestination |
   |   | hadoop.hdfs.server.federation.router.TestRouterRpc |
   |   | hadoop.hdfs.server.federation.router.TestRouterClientRejectOverload |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6639/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6639 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell detsecrets xmllint |
   | uname | Linux a880d5c509fb 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / bc85a82f47deaaadbd32b894430d4abad7f8d27b |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6639/1/testReport/ |
   | Max. process+thread count | 3619 (vs. ulimit of 5500) |
   | modules | C: 

[jira] [Updated] (HDFS-17432) Fix junit dependency to enable JUnit4 tests to run in hadoop-hdfs-rbf

2024-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17432:
--
Labels: pull-request-available  (was: )

> Fix junit dependency to enable JUnit4 tests to run in hadoop-hdfs-rbf
> -
>
> Key: HDFS-17432
> URL: https://issues.apache.org/jira/browse/HDFS-17432
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
>
> After HDFS-17370, JUnit4 tests stopped running in hadoop-hdfs-rbf. To enable 
> both JUnit4 and JUnit5 tests to run, we need to add junit-vintage-engine to 
> the hadoop-hdfs-rbf/pom.xml.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17432) Fix junit dependency to enable JUnit4 tests to run in hadoop-hdfs-rbf

2024-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828149#comment-17828149
 ] 

ASF GitHub Bot commented on HDFS-17432:
---

tasanuma opened a new pull request, #6639:
URL: https://github.com/apache/hadoop/pull/6639

   
   
   
   
   ### Description of PR
   
   After HDFS-17370, JUnit4 tests stopped running in hadoop-hdfs-rbf. To enable 
both JUnit4 and JUnit5 tests to run, we need to add junit-vintage-engine to the 
hadoop-hdfs-rbf/pom.xml.
   
   ### How was this patch tested?
   
   I confirmed that all unit tests in hadoop-hdfs-rbf ran with this change with 
`cd hadoop-hdfs-project/hadoop-hdfs-rbf && mvn clean test -fae` in my local 
laptop.
   
   ### For code changes:
   
   - [x] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   




> Fix junit dependency to enable JUnit4 tests to run in hadoop-hdfs-rbf
> -
>
> Key: HDFS-17432
> URL: https://issues.apache.org/jira/browse/HDFS-17432
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>
> After HDFS-17370, JUnit4 tests stopped running in hadoop-hdfs-rbf. To enable 
> both JUnit4 and JUnit5 tests to run, we need to add junit-vintage-engine to 
> the hadoop-hdfs-rbf/pom.xml.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17432) Fix junit dependency to enable JUnit4 tests to run in hadoop-hdfs-rbf

2024-03-18 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-17432:
---

 Summary: Fix junit dependency to enable JUnit4 tests to run in 
hadoop-hdfs-rbf
 Key: HDFS-17432
 URL: https://issues.apache.org/jira/browse/HDFS-17432
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma


After HDFS-17370, JUnit4 tests stopped running in hadoop-hdfs-rbf. To enable 
both JUnit4 and JUnit5 tests to run, we need to add junit-vintage-engine to the 
hadoop-hdfs-rbf/pom.xml.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17370) Fix junit dependency for running parameterized tests in hadoop-hdfs-rbf

2024-03-18 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828144#comment-17828144
 ] 

Takanobu Asanuma commented on HDFS-17370:
-

[~ayushtkn] Thank you for investigating. It indeed appears that this change is 
the cause of the issue, my apologies for that.
It seems like we need to add junit-vintage-engine to hadoop-hdfs-rbf/pom.xml. 
I'll create a PR later.

> Fix junit dependency for running parameterized tests in hadoop-hdfs-rbf
> ---
>
> Key: HDFS-17370
> URL: https://issues.apache.org/jira/browse/HDFS-17370
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.4.1, 3.5.0
>
>
> We need to add junit-jupiter-engine dependency for running parameterized 
> tests in hadoop-hdfs-rbf.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17430) RecoveringBlock will skip no live replicas when get block recovery command.

2024-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828054#comment-17828054
 ] 

ASF GitHub Bot commented on HDFS-17430:
---

hadoop-yetus commented on PR #6635:
URL: https://github.com/apache/hadoop/pull/6635#issuecomment-2004489045

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 23s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  36m 24s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 48s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 48s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 37s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 47s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   2m  4s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m  8s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 29s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 41s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 34s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  1s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 44s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  24m 23s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 213m 20s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6635/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 27s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 312m 40s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.tools.TestDFSAdmin |
   |   | hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs |
   |   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6635/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6635 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux f3a291500e15 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 6376ed89e382dae4276aeeff3f7dba2def8ead7d |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 

[jira] [Commented] (HDFS-17370) Fix junit dependency for running parameterized tests in hadoop-hdfs-rbf

2024-03-18 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828028#comment-17828028
 ] 

Ayush Saxena commented on HDFS-17370:
-

Hi [~tasanuma]/[~simbadzina]
I think the router tests aren't running now: Only two tests are running, If you 
check this PR, the tests ran only for 9 mins
{noformat}
+1 :green_heart:unit9m 48s  hadoop-hdfs-rbf in the patch 
passed.
{noformat}
where in another comment here 
[https://github.com/apache/hadoop/pull/6510#issuecomment-1918979261], it took 
some 22m, that sounds still ok, considering parallel test profile being used.
{noformat}
+1 :green_heart: unit   22m 8s  hadoop-hdfs-rbf in the patch passed.
{noformat}
>From the daily build here:
[https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1519/testReport/org.apache.hadoop.hdfs.server.federation.router/]

It shows two tests only & I am pretty sure TestRouterRpc is one missing in the 
package. I think the two tests running are Junit5 & others are Junit4 stuff, 
enabling them screwed up the existing ones.

I haven't debugged much, just doubting this as this goes near.

cc. [~elgoiri] in case you have any pointers

> Fix junit dependency for running parameterized tests in hadoop-hdfs-rbf
> ---
>
> Key: HDFS-17370
> URL: https://issues.apache.org/jira/browse/HDFS-17370
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.4.1, 3.5.0
>
>
> We need to add junit-jupiter-engine dependency for running parameterized 
> tests in hadoop-hdfs-rbf.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17430) RecoveringBlock will skip no live replicas when get block recovery command.

2024-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828025#comment-17828025
 ] 

ASF GitHub Bot commented on HDFS-17430:
---

hadoop-yetus commented on PR #6635:
URL: https://github.com/apache/hadoop/pull/6635#issuecomment-2004253021

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 22s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | -1 :x: |  mvninstall  |  32m  6s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6635/1/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | +1 :green_heart: |  compile  |   0m 45s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 39s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 44s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  6s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 41s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 35s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 27s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 31s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 43s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 40s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 217m 23s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6635/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 28s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 304m 59s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestReadStripedFileWithDecoding |
   |   | hadoop.hdfs.TestReconstructStripedFileWithValidator |
   |   | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
   |   | hadoop.hdfs.TestLeaseRecovery2 |
   |   | hadoop.hdfs.TestDFSShell |
   |   | hadoop.hdfs.TestReconstructStripedFile |
   |   | hadoop.hdfs.TestReadStripedFileWithDNFailure |
   |   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
   |   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6635/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6635 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 0a365a237f30 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
  

[jira] [Commented] (HDFS-17380) FsImageValidation: remove inaccessible nodes

2024-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828021#comment-17828021
 ] 

ASF GitHub Bot commented on HDFS-17380:
---

szetszwo commented on PR #6549:
URL: https://github.com/apache/hadoop/pull/6549#issuecomment-2004194640

   @Hexiaoqiao , thanks a lot for reviewing and merging this!




> FsImageValidation: remove inaccessible nodes
> 
>
> Key: HDFS-17380
> URL: https://issues.apache.org/jira/browse/HDFS-17380
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> If a fsimage is corrupted,  it may have inaccessible nodes.  The 
> FsImageValidation tool currently is able to identify the inaccessible nodes 
> when validating the INodeMap.  This JIRA is to update the tool to remove the 
> inaccessible nodes and then save a new fsimage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17389) [FGL] Client RPCs involving read process supports fine-grained lock

2024-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827997#comment-17827997
 ] 

ASF GitHub Bot commented on HDFS-17389:
---

hadoop-yetus commented on PR #6590:
URL: https://github.com/apache/hadoop/pull/6590#issuecomment-2004046408

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  17m 40s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ HDFS-17384 Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  44m 21s |  |  HDFS-17384 passed  |
   | +1 :green_heart: |  compile  |   1m 24s |  |  HDFS-17384 passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   1m 13s |  |  HDFS-17384 passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   1m 12s |  |  HDFS-17384 passed  |
   | +1 :green_heart: |  mvnsite  |   1m 22s |  |  HDFS-17384 passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  HDFS-17384 passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 41s |  |  HDFS-17384 passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 14s |  |  HDFS-17384 passed  |
   | +1 :green_heart: |  shadedclient  |  35m 25s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 13s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  6s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   1m  6s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 59s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6590/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 281 unchanged 
- 0 fixed = 282 total (was 281)  |
   | +1 :green_heart: |  mvnsite  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 52s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 14s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  35m 52s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 279m 24s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6590/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 47s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 437m 12s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.tools.TestDFSAdmin |
   |   | hadoop.hdfs.server.blockmanagement.TestBlockManager |
   |   | hadoop.hdfs.server.datanode.TestLargeBlockReport |
   |   | hadoop.hdfs.server.namenode.TestLeaseManager |
   |   | hadoop.hdfs.protocol.TestBlockListAsLongs |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6590/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6590 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 27e3f3d82144 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | 

[jira] [Commented] (HDFS-17410) [FGL] Client RPCs that changes file attributes supports fine-grained lock

2024-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827994#comment-17827994
 ] 

ASF GitHub Bot commented on HDFS-17410:
---

hadoop-yetus commented on PR #6634:
URL: https://github.com/apache/hadoop/pull/6634#issuecomment-2004031052

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   6m 44s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ HDFS-17384 Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 48s |  |  HDFS-17384 passed  |
   | +1 :green_heart: |  compile  |   0m 46s |  |  HDFS-17384 passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  HDFS-17384 passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 38s |  |  HDFS-17384 passed  |
   | +1 :green_heart: |  mvnsite  |   0m 46s |  |  HDFS-17384 passed  |
   | +1 :green_heart: |  javadoc  |   0m 43s |  |  HDFS-17384 passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  4s |  |  HDFS-17384 passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 48s |  |  HDFS-17384 passed  |
   | +1 :green_heart: |  shadedclient  |  20m 53s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 29s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 37s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 35s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 211m  6s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6634/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 29s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 305m 40s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestQuota |
   |   | hadoop.hdfs.server.datanode.TestLargeBlockReport |
   |   | hadoop.hdfs.protocol.TestBlockListAsLongs |
   |   | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes |
   |   | hadoop.hdfs.tools.TestDFSAdmin |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6634/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6634 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux c87e1102468f 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | HDFS-17384 / a47bbfd3bdb8bbe84eaa87ebea8674bb7cf709e8 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 

[jira] [Created] (HDFS-17431) Fix log format for BlockRecoveryWorker#recoverBlocks

2024-03-18 Thread Haiyang Hu (Jira)
Haiyang Hu created HDFS-17431:
-

 Summary: Fix log format for BlockRecoveryWorker#recoverBlocks
 Key: HDFS-17431
 URL: https://issues.apache.org/jira/browse/HDFS-17431
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haiyang Hu
Assignee: Haiyang Hu


Fix log format for BlockRecoveryWorker#recoverBlocks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17430) RecoveringBlock will skip no live replicas when get block recovery command.

2024-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827944#comment-17827944
 ] 

ASF GitHub Bot commented on HDFS-17430:
---

haiyang1987 commented on PR #6635:
URL: https://github.com/apache/hadoop/pull/6635#issuecomment-2003727035

   Update PR.
   Hi @ZanderXu @Hexiaoqiao @tasanuma @zhangshuyan0 please help me review this 
PR when you have free time, Thank you very much.




> RecoveringBlock will skip no live replicas when get block recovery command.
> ---
>
> Key: HDFS-17430
> URL: https://issues.apache.org/jira/browse/HDFS-17430
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> RecoveringBlock maybe skip no live replicas when get block recovery command.
> *Issue:*
> Currently the following scenarios may lead to failure in the execution of 
> BlockRecoveryWorker by the datanode, resulting file being not to be closed 
> for a long time.
> *t1.*  The block_xxx_xxx has two replicas[dn1,dn2]; the dn1 machine shut down 
> and will be dead status, the dn2 is live status.
> *t2.* Occurs block recovery.
> related logs:
> {code:java}
> 2024-03-13 21:58:00.651 WARN hdfs.StateChange DIR* 
> NameSystem.internalReleaseLease: File /xxx/file has not been closed. Lease 
> recovery is in progress. RecoveryId = 28577373754 for block blk_xxx_xxx
> {code}
> *t3.*  The dn2 is chosen for block recovery.
> dn1 is marked as stale (is dead state) at this time, here the 
> recoveryLocations size is 1, currently according to the following logic, dn1 
> and dn2 will be chosen to participate in block recovery.
> DatanodeManager#getBlockRecoveryCommand
> {code:java}
>// Skip stale nodes during recovery
>  final List recoveryLocations =
>  new ArrayList<>(storages.length);
>  final List storageIdx = new ArrayList<>(storages.length);
>  for (int i = 0; i < storages.length; ++i) {
>if (!storages[i].getDatanodeDescriptor().isStale(staleInterval)) {
>  recoveryLocations.add(storages[i]);
>  storageIdx.add(i);
>}
>  }
>  ...
>  // If we only get 1 replica after eliminating stale nodes, choose all
>  // replicas for recovery and let the primary data node handle failures.
>  DatanodeInfo[] recoveryInfos;
>  if (recoveryLocations.size() > 1) {
>if (recoveryLocations.size() != storages.length) {
>  LOG.info("Skipped stale nodes for recovery : "
>  + (storages.length - recoveryLocations.size()));
>}
>recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(recoveryLocations);
>  } else {
>// If too many replicas are stale, then choose all replicas to
>// participate in block recovery.
>recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(storages);
>  }
> {code}
> {code:java}
> 2024-03-13 21:58:01,425 INFO  datanode.DataNode 
> (BlockRecoveryWorker.java:logRecoverBlock(563))
> [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
> BlockRecoveryWorker: NameNode at xxx:8040 calls 
> recoverBlock(BP-xxx:blk_xxx_xxx, 
> targets=[DatanodeInfoWithStorage[dn1:50010,null,null], 
> DatanodeInfoWithStorage[dn2:50010,null,null]], 
> newGenerationStamp=28577373754, newBlock=null, isStriped=false)
> {code}
> *t4.* When dn2 executes BlockRecoveryWorker#recover, it will call 
> initReplicaRecovery operation on dn1, however, since the dn1 machine is 
> currently down state at this time, it will take a very long time to timeout,  
> the default number of retries to establish a server connection is 45 times.
> related logs:
> {code:java}
> 2024-03-13 21:59:31,518 INFO  ipc.Client 
> (Client.java:handleConnectionTimeout(904)) 
> [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
> Retrying connect to server: dn1:8010. Already tried 0 time(s); maxRetries=45
> ...
> 2024-03-13 23:05:35,295 INFO  ipc.Client 
> (Client.java:handleConnectionTimeout(904)) 
> [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
> Retrying connect to server: dn2:8010. Already tried 44 time(s); maxRetries=45
> 2024-03-13 23:07:05,392 WARN  protocol.InterDatanodeProtocol 
> (BlockRecoveryWorker.java:recover(170)) 
> [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
> Failed to recover block (block=BP-xxx:blk_xxx_xxx, 
> datanode=DatanodeInfoWithStorage[dn1:50010,null,null]) 
> org.apache.hadoop.net.ConnectTimeoutException:
> Call From dn2 to dn1:8010 failed on socket timeout exception: 
> org.apache.hadoop.net.ConnectTimeoutException: 9 millis timeout while 
> waiting for channel to be ready for connect.ch : 
> java.nio.channels.SocketChannel[connection-pending 

[jira] [Commented] (HDFS-17430) RecoveringBlock will skip no live replicas when get block recovery command.

2024-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827941#comment-17827941
 ] 

ASF GitHub Bot commented on HDFS-17430:
---

haiyang1987 commented on code in PR #6635:
URL: https://github.com/apache/hadoop/pull/6635#discussion_r1528394415


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java:
##
@@ -628,7 +628,7 @@ public void run() {
 new RecoveryTaskContiguous(b).recover();
   }
 } catch (IOException e) {
-  LOG.warn("recover Block: {} FAILED: {}", b, e);
+  LOG.warn("recover Block: {} FAILED: ", b, e);

Review Comment:
   yeah, I will submit a new issue to fix this small problem.



##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java:
##
@@ -1755,12 +1755,19 @@ private BlockRecoveryCommand 
getBlockRecoveryCommand(String blockPoolId,
   LOG.info("Skipped stale nodes for recovery : "
   + (storages.length - recoveryLocations.size()));
 }
-recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(recoveryLocations);
   } else {
-// If too many replicas are stale, then choose all replicas to
+// If too many replicas are stale, then choose live replicas to
 // participate in block recovery.
-recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(storages);
+recoveryLocations.clear();
+storageIdx.clear();
+for (int i = 0; i < storages.length; ++i) {
+  if (storages[i].getDatanodeDescriptor().isAlive()) {
+recoveryLocations.add(storages[i]);
+storageIdx.add(i);
+  }

Review Comment:
   get it





> RecoveringBlock will skip no live replicas when get block recovery command.
> ---
>
> Key: HDFS-17430
> URL: https://issues.apache.org/jira/browse/HDFS-17430
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> RecoveringBlock maybe skip no live replicas when get block recovery command.
> *Issue:*
> Currently the following scenarios may lead to failure in the execution of 
> BlockRecoveryWorker by the datanode, resulting file being not to be closed 
> for a long time.
> *t1.*  The block_xxx_xxx has two replicas[dn1,dn2]; the dn1 machine shut down 
> and will be dead status, the dn2 is live status.
> *t2.* Occurs block recovery.
> related logs:
> {code:java}
> 2024-03-13 21:58:00.651 WARN hdfs.StateChange DIR* 
> NameSystem.internalReleaseLease: File /xxx/file has not been closed. Lease 
> recovery is in progress. RecoveryId = 28577373754 for block blk_xxx_xxx
> {code}
> *t3.*  The dn2 is chosen for block recovery.
> dn1 is marked as stale (is dead state) at this time, here the 
> recoveryLocations size is 1, currently according to the following logic, dn1 
> and dn2 will be chosen to participate in block recovery.
> DatanodeManager#getBlockRecoveryCommand
> {code:java}
>// Skip stale nodes during recovery
>  final List recoveryLocations =
>  new ArrayList<>(storages.length);
>  final List storageIdx = new ArrayList<>(storages.length);
>  for (int i = 0; i < storages.length; ++i) {
>if (!storages[i].getDatanodeDescriptor().isStale(staleInterval)) {
>  recoveryLocations.add(storages[i]);
>  storageIdx.add(i);
>}
>  }
>  ...
>  // If we only get 1 replica after eliminating stale nodes, choose all
>  // replicas for recovery and let the primary data node handle failures.
>  DatanodeInfo[] recoveryInfos;
>  if (recoveryLocations.size() > 1) {
>if (recoveryLocations.size() != storages.length) {
>  LOG.info("Skipped stale nodes for recovery : "
>  + (storages.length - recoveryLocations.size()));
>}
>recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(recoveryLocations);
>  } else {
>// If too many replicas are stale, then choose all replicas to
>// participate in block recovery.
>recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(storages);
>  }
> {code}
> {code:java}
> 2024-03-13 21:58:01,425 INFO  datanode.DataNode 
> (BlockRecoveryWorker.java:logRecoverBlock(563))
> [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
> BlockRecoveryWorker: NameNode at xxx:8040 calls 
> recoverBlock(BP-xxx:blk_xxx_xxx, 
> targets=[DatanodeInfoWithStorage[dn1:50010,null,null], 
> DatanodeInfoWithStorage[dn2:50010,null,null]], 
> newGenerationStamp=28577373754, newBlock=null, isStriped=false)
> {code}
> *t4.* When dn2 executes 

[jira] [Commented] (HDFS-17430) RecoveringBlock will skip no live replicas when get block recovery command.

2024-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827939#comment-17827939
 ] 

ASF GitHub Bot commented on HDFS-17430:
---

haiyang1987 commented on code in PR #6635:
URL: https://github.com/apache/hadoop/pull/6635#discussion_r1528393272


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java:
##
@@ -1755,12 +1755,19 @@ private BlockRecoveryCommand 
getBlockRecoveryCommand(String blockPoolId,
   LOG.info("Skipped stale nodes for recovery : "
   + (storages.length - recoveryLocations.size()));
 }
-recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(recoveryLocations);
   } else {
-// If too many replicas are stale, then choose all replicas to
+// If too many replicas are stale, then choose live replicas to
 // participate in block recovery.
-recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(storages);
+recoveryLocations.clear();
+storageIdx.clear();
+for (int i = 0; i < storages.length; ++i) {
+  if (storages[i].getDatanodeDescriptor().isAlive()) {
+recoveryLocations.add(storages[i]);
+storageIdx.add(i);
+  }

Review Comment:
   yeah, i will add it later.





> RecoveringBlock will skip no live replicas when get block recovery command.
> ---
>
> Key: HDFS-17430
> URL: https://issues.apache.org/jira/browse/HDFS-17430
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> RecoveringBlock maybe skip no live replicas when get block recovery command.
> *Issue:*
> Currently the following scenarios may lead to failure in the execution of 
> BlockRecoveryWorker by the datanode, resulting file being not to be closed 
> for a long time.
> *t1.*  The block_xxx_xxx has two replicas[dn1,dn2]; the dn1 machine shut down 
> and will be dead status, the dn2 is live status.
> *t2.* Occurs block recovery.
> related logs:
> {code:java}
> 2024-03-13 21:58:00.651 WARN hdfs.StateChange DIR* 
> NameSystem.internalReleaseLease: File /xxx/file has not been closed. Lease 
> recovery is in progress. RecoveryId = 28577373754 for block blk_xxx_xxx
> {code}
> *t3.*  The dn2 is chosen for block recovery.
> dn1 is marked as stale (is dead state) at this time, here the 
> recoveryLocations size is 1, currently according to the following logic, dn1 
> and dn2 will be chosen to participate in block recovery.
> DatanodeManager#getBlockRecoveryCommand
> {code:java}
>// Skip stale nodes during recovery
>  final List recoveryLocations =
>  new ArrayList<>(storages.length);
>  final List storageIdx = new ArrayList<>(storages.length);
>  for (int i = 0; i < storages.length; ++i) {
>if (!storages[i].getDatanodeDescriptor().isStale(staleInterval)) {
>  recoveryLocations.add(storages[i]);
>  storageIdx.add(i);
>}
>  }
>  ...
>  // If we only get 1 replica after eliminating stale nodes, choose all
>  // replicas for recovery and let the primary data node handle failures.
>  DatanodeInfo[] recoveryInfos;
>  if (recoveryLocations.size() > 1) {
>if (recoveryLocations.size() != storages.length) {
>  LOG.info("Skipped stale nodes for recovery : "
>  + (storages.length - recoveryLocations.size()));
>}
>recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(recoveryLocations);
>  } else {
>// If too many replicas are stale, then choose all replicas to
>// participate in block recovery.
>recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(storages);
>  }
> {code}
> {code:java}
> 2024-03-13 21:58:01,425 INFO  datanode.DataNode 
> (BlockRecoveryWorker.java:logRecoverBlock(563))
> [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
> BlockRecoveryWorker: NameNode at xxx:8040 calls 
> recoverBlock(BP-xxx:blk_xxx_xxx, 
> targets=[DatanodeInfoWithStorage[dn1:50010,null,null], 
> DatanodeInfoWithStorage[dn2:50010,null,null]], 
> newGenerationStamp=28577373754, newBlock=null, isStriped=false)
> {code}
> *t4.* When dn2 executes BlockRecoveryWorker#recover, it will call 
> initReplicaRecovery operation on dn1, however, since the dn1 machine is 
> currently down state at this time, it will take a very long time to timeout,  
> the default number of retries to establish a server connection is 45 times.
> related logs:
> {code:java}
> 2024-03-13 21:59:31,518 INFO  ipc.Client 
> (Client.java:handleConnectionTimeout(904)) 
> [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
> Retrying 

[jira] [Commented] (HDFS-17388) [FGL] Client RPCs involving write process supports fine-grained lock

2024-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827922#comment-17827922
 ] 

ASF GitHub Bot commented on HDFS-17388:
---

ferhui commented on code in PR #6589:
URL: https://github.com/apache/hadoop/pull/6589#discussion_r1528016536


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionZoneManager.java:
##


Review Comment:
   seems that only changed the testing methods. Is it expected?



##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java:
##
@@ -3699,7 +3702,7 @@ void fsync(String src, long fileId, String clientName, 
long lastBlockLength)
   }
   FSDirWriteFileOp.persistBlocks(dir, src, pendingFile, false);
 } finally {
-  writeUnlock("fsync");
+  writeUnlock(FSNamesystemLockMode.GLOBAL, "fsync");

Review Comment:
   Why use Global lock here for fsync? found that it only checks FS lock in 
following codes.





> [FGL] Client RPCs involving write process supports fine-grained lock
> 
>
> Key: HDFS-17388
> URL: https://issues.apache.org/jira/browse/HDFS-17388
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> The client write process involves many client RPCs. 
>  
> This ticket is used to make these RPCs support fine-grained lock.
>  * mkdir 
>  * create
>  * addBlock
>  * abandonBlock
>  * getAdditionalDatanode
>  * updateBlockForPipeline
>  * updatePipeline
>  * fsync
>  * complete
>  * rename
>  * rename2
>  * append
>  * renewLease
>  * recoverLease
>  * delete
>  * createSymlink
>  * renewDelegationToken
>  * cancelDelegationToken



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17430) RecoveringBlock will skip no live replicas when get block recovery command.

2024-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827917#comment-17827917
 ] 

ASF GitHub Bot commented on HDFS-17430:
---

ZanderXu commented on code in PR #6635:
URL: https://github.com/apache/hadoop/pull/6635#discussion_r1528310021


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockRecoveryWorker.java:
##
@@ -628,7 +628,7 @@ public void run() {
 new RecoveryTaskContiguous(b).recover();
   }
 } catch (IOException e) {
-  LOG.warn("recover Block: {} FAILED: {}", b, e);
+  LOG.warn("recover Block: {} FAILED: ", b, e);

Review Comment:
   this modification has nothing to do with this issue, right? 



##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java:
##
@@ -1755,12 +1755,19 @@ private BlockRecoveryCommand 
getBlockRecoveryCommand(String blockPoolId,
   LOG.info("Skipped stale nodes for recovery : "
   + (storages.length - recoveryLocations.size()));
 }
-recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(recoveryLocations);
   } else {
-// If too many replicas are stale, then choose all replicas to
+// If too many replicas are stale, then choose live replicas to
 // participate in block recovery.
-recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(storages);
+recoveryLocations.clear();
+storageIdx.clear();
+for (int i = 0; i < storages.length; ++i) {
+  if (storages[i].getDatanodeDescriptor().isAlive()) {
+recoveryLocations.add(storages[i]);
+storageIdx.add(i);
+  }

Review Comment:
   please add some logs for this case.



##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java:
##
@@ -1755,12 +1755,19 @@ private BlockRecoveryCommand 
getBlockRecoveryCommand(String blockPoolId,
   LOG.info("Skipped stale nodes for recovery : "
   + (storages.length - recoveryLocations.size()));
 }
-recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(recoveryLocations);
   } else {
-// If too many replicas are stale, then choose all replicas to
+// If too many replicas are stale, then choose live replicas to
 // participate in block recovery.
-recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(storages);
+recoveryLocations.clear();
+storageIdx.clear();
+for (int i = 0; i < storages.length; ++i) {
+  if (storages[i].getDatanodeDescriptor().isAlive()) {
+recoveryLocations.add(storages[i]);
+storageIdx.add(i);
+  }

Review Comment:
   please check if all replicas are dead.





> RecoveringBlock will skip no live replicas when get block recovery command.
> ---
>
> Key: HDFS-17430
> URL: https://issues.apache.org/jira/browse/HDFS-17430
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> RecoveringBlock maybe skip no live replicas when get block recovery command.
> *Issue:*
> Currently the following scenarios may lead to failure in the execution of 
> BlockRecoveryWorker by the datanode, resulting file being not to be closed 
> for a long time.
> *t1.*  The block_xxx_xxx has two replicas[dn1,dn2]; the dn1 machine shut down 
> and will be dead status, the dn2 is live status.
> *t2.* Occurs block recovery.
> related logs:
> {code:java}
> 2024-03-13 21:58:00.651 WARN hdfs.StateChange DIR* 
> NameSystem.internalReleaseLease: File /xxx/file has not been closed. Lease 
> recovery is in progress. RecoveryId = 28577373754 for block blk_xxx_xxx
> {code}
> *t3.*  The dn2 is chosen for block recovery.
> dn1 is marked as stale (is dead state) at this time, here the 
> recoveryLocations size is 1, currently according to the following logic, dn1 
> and dn2 will be chosen to participate in block recovery.
> DatanodeManager#getBlockRecoveryCommand
> {code:java}
>// Skip stale nodes during recovery
>  final List recoveryLocations =
>  new ArrayList<>(storages.length);
>  final List storageIdx = new ArrayList<>(storages.length);
>  for (int i = 0; i < storages.length; ++i) {
>if (!storages[i].getDatanodeDescriptor().isStale(staleInterval)) {
>  recoveryLocations.add(storages[i]);
>  storageIdx.add(i);
>}
>  }
>  ...
>  // If we only get 1 replica after eliminating stale nodes, choose all
>  // replicas for recovery and let the primary data node handle 

[jira] [Commented] (HDFS-17430) RecoveringBlock will skip no live replicas when get block recovery command.

2024-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827911#comment-17827911
 ] 

ASF GitHub Bot commented on HDFS-17430:
---

haiyang1987 opened a new pull request, #6635:
URL: https://github.com/apache/hadoop/pull/6635

   ### Description of PR
   https://issues.apache.org/jira/browse/HDFS-17430
   
   RecoveringBlock maybe skip no live replicas when get block recovery command.
   
   
   **Issue:**
   Currently the following scenarios may lead to failure in the execution of 
BlockRecoveryWorker by the datanode, resulting file being not to be closed for 
a long time.
   
   **t1.** The block_xxx_xxx has two replicas[dn1,dn2]; the dn1 machine shut 
down and will be dead status, the dn2 is live status.
   
   **t2.** Occurs block recovery.
   related logs:
   ```
   2024-03-13 21:58:00.651 WARN hdfs.StateChangeDIR* 
NameSystem.internalReleaseLease: File /xxx/file has not been closed. Lease 
recovery is in progress. RecoveryId = 28577373754 for block blk_xxx_xxx
   ```
   
   **t3.** The dn2 is chosen for block recovery.
   dn1 is marked as stale (is dead state) at this time, here the 
recoveryLocations size is 1, currently according to the following logic, dn1 
and dn2 will be chosen to participate in block recovery.
   
   DatanodeManager#getBlockRecoveryCommand
   ```
  // Skip stale nodes during recovery
final List recoveryLocations =
new ArrayList<>(storages.length);
final List storageIdx = new ArrayList<>(storages.length);
for (int i = 0; i < storages.length; ++i) {
  if (!storages[i].getDatanodeDescriptor().isStale(staleInterval)) {
recoveryLocations.add(storages[i]);
storageIdx.add(i);
  }
}
...
// If we only get 1 replica after eliminating stale nodes, choose all
// replicas for recovery and let the primary data node handle failures.
DatanodeInfo[] recoveryInfos;
if (recoveryLocations.size() > 1) {
  if (recoveryLocations.size() != storages.length) {
LOG.info("Skipped stale nodes for recovery : "
+ (storages.length - recoveryLocations.size()));
  }
  recoveryInfos = 
DatanodeStorageInfo.toDatanodeInfos(recoveryLocations);
} else {
  // If too many replicas are stale, then choose all replicas to
  // participate in block recovery.
  recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(storages);
}
   ```
   ```
   2024-03-13 21:58:01,425 INFO  datanode.DataNode 
(BlockRecoveryWorker.java:logRecoverBlock(563))
   [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
   BlockRecoveryWorker: NameNode at xxx:8040 calls 
recoverBlock(BP-xxx:blk_xxx_xxx, 
targets=[DatanodeInfoWithStorage[dn1:50010,null,null], 
DatanodeInfoWithStorage[dn2:50010,null,null]], newGenerationStamp=28577373754, 
newBlock=null, isStriped=false)
   ```
   
   **t4.** When dn2 executes BlockRecoveryWorker#recover, it will call 
initReplicaRecovery operation on dn1, however, since the dn1 machine is 
currently down state at this time, it will take a very long time to timeout, 
the default number of retries to establish a server connection is 45 times.
   related logs:
   
   ```
   2024-03-13 21:59:31,518 INFO  ipc.Client 
(Client.java:handleConnectionTimeout(904)) 
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
Retrying connect to server: dn1:8010. Already tried 0 time(s); maxRetries=45
   ...
   2024-03-13 23:05:35,295 INFO  ipc.Client 
(Client.java:handleConnectionTimeout(904)) 
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
Retrying connect to server: dn2:8010. Already tried 44 time(s); maxRetries=45
   
   2024-03-13 23:07:05,392 WARN  protocol.InterDatanodeProtocol 
(BlockRecoveryWorker.java:recover(170)) 
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
   Failed to recover block (block=BP-xxx:blk_xxx_xxx, 
datanode=DatanodeInfoWithStorage[dn1:50010,null,null]) 
org.apache.hadoop.net.ConnectTimeoutException:
   Call From dn2 to dn1:8010 failed on socket timeout exception: 
org.apache.hadoop.net.ConnectTimeoutException: 9 millis timeout while 
waiting for channel to be ready for connect.ch : 
java.nio.channels.SocketChannel[connection-pending remote=dn:8010]; For more 
details see:  http://wiki.apache.org/hadoop/SocketTimeout
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:931)
  

[jira] [Updated] (HDFS-17430) RecoveringBlock will skip no live replicas when get block recovery command.

2024-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17430:
--
Labels: pull-request-available  (was: )

> RecoveringBlock will skip no live replicas when get block recovery command.
> ---
>
> Key: HDFS-17430
> URL: https://issues.apache.org/jira/browse/HDFS-17430
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> RecoveringBlock maybe skip no live replicas when get block recovery command.
> *Issue:*
> Currently the following scenarios may lead to failure in the execution of 
> BlockRecoveryWorker by the datanode, resulting file being not to be closed 
> for a long time.
> *t1.*  The block_xxx_xxx has two replicas[dn1,dn2]; the dn1 machine shut down 
> and will be dead status, the dn2 is live status.
> *t2.* Occurs block recovery.
> related logs:
> {code:java}
> 2024-03-13 21:58:00.651 WARN hdfs.StateChange DIR* 
> NameSystem.internalReleaseLease: File /xxx/file has not been closed. Lease 
> recovery is in progress. RecoveryId = 28577373754 for block blk_xxx_xxx
> {code}
> *t3.*  The dn2 is chosen for block recovery.
> dn1 is marked as stale (is dead state) at this time, here the 
> recoveryLocations size is 1, currently according to the following logic, dn1 
> and dn2 will be chosen to participate in block recovery.
> DatanodeManager#getBlockRecoveryCommand
> {code:java}
>// Skip stale nodes during recovery
>  final List recoveryLocations =
>  new ArrayList<>(storages.length);
>  final List storageIdx = new ArrayList<>(storages.length);
>  for (int i = 0; i < storages.length; ++i) {
>if (!storages[i].getDatanodeDescriptor().isStale(staleInterval)) {
>  recoveryLocations.add(storages[i]);
>  storageIdx.add(i);
>}
>  }
>  ...
>  // If we only get 1 replica after eliminating stale nodes, choose all
>  // replicas for recovery and let the primary data node handle failures.
>  DatanodeInfo[] recoveryInfos;
>  if (recoveryLocations.size() > 1) {
>if (recoveryLocations.size() != storages.length) {
>  LOG.info("Skipped stale nodes for recovery : "
>  + (storages.length - recoveryLocations.size()));
>}
>recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(recoveryLocations);
>  } else {
>// If too many replicas are stale, then choose all replicas to
>// participate in block recovery.
>recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(storages);
>  }
> {code}
> {code:java}
> 2024-03-13 21:58:01,425 INFO  datanode.DataNode 
> (BlockRecoveryWorker.java:logRecoverBlock(563))
> [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
> BlockRecoveryWorker: NameNode at xxx:8040 calls 
> recoverBlock(BP-xxx:blk_xxx_xxx, 
> targets=[DatanodeInfoWithStorage[dn1:50010,null,null], 
> DatanodeInfoWithStorage[dn2:50010,null,null]], 
> newGenerationStamp=28577373754, newBlock=null, isStriped=false)
> {code}
> *t4.* When dn2 executes BlockRecoveryWorker#recover, it will call 
> initReplicaRecovery operation on dn1, however, since the dn1 machine is 
> currently down state at this time, it will take a very long time to timeout,  
> the default number of retries to establish a server connection is 45 times.
> related logs:
> {code:java}
> 2024-03-13 21:59:31,518 INFO  ipc.Client 
> (Client.java:handleConnectionTimeout(904)) 
> [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
> Retrying connect to server: dn1:8010. Already tried 0 time(s); maxRetries=45
> ...
> 2024-03-13 23:05:35,295 INFO  ipc.Client 
> (Client.java:handleConnectionTimeout(904)) 
> [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
> Retrying connect to server: dn2:8010. Already tried 44 time(s); maxRetries=45
> 2024-03-13 23:07:05,392 WARN  protocol.InterDatanodeProtocol 
> (BlockRecoveryWorker.java:recover(170)) 
> [org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
> Failed to recover block (block=BP-xxx:blk_xxx_xxx, 
> datanode=DatanodeInfoWithStorage[dn1:50010,null,null]) 
> org.apache.hadoop.net.ConnectTimeoutException:
> Call From dn2 to dn1:8010 failed on socket timeout exception: 
> org.apache.hadoop.net.ConnectTimeoutException: 9 millis timeout while 
> waiting for channel to be ready for connect.ch : 
> java.nio.channels.SocketChannel[connection-pending remote=dn:8010]; For more 
> details see:  http://wiki.apache.org/hadoop/SocketTimeout
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> 

[jira] [Updated] (HDFS-17430) RecoveringBlock will skip no live replicas when get block recovery command.

2024-03-18 Thread Haiyang Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haiyang Hu updated HDFS-17430:
--
Description: 
RecoveringBlock maybe skip no live replicas when get block recovery command.

*Issue:*
Currently the following scenarios may lead to failure in the execution of 
BlockRecoveryWorker by the datanode, resulting file being not to be closed for 
a long time.

*t1.*  The block_xxx_xxx has two replicas[dn1,dn2]; the dn1 machine shut down 
and will be dead status, the dn2 is live status.
*t2.* Occurs block recovery.
related logs:

{code:java}
2024-03-13 21:58:00.651 WARN hdfs.StateChange   DIR* 
NameSystem.internalReleaseLease: File /xxx/file has not been closed. Lease 
recovery is in progress. RecoveryId = 28577373754 for block blk_xxx_xxx
{code}

*t3.*  The dn2 is chosen for block recovery.
dn1 is marked as stale (is dead state) at this time, here the recoveryLocations 
size is 1, currently according to the following logic, dn1 and dn2 will be 
chosen to participate in block recovery.

DatanodeManager#getBlockRecoveryCommand
{code:java}
   // Skip stale nodes during recovery
 final List recoveryLocations =
 new ArrayList<>(storages.length);
 final List storageIdx = new ArrayList<>(storages.length);
 for (int i = 0; i < storages.length; ++i) {
   if (!storages[i].getDatanodeDescriptor().isStale(staleInterval)) {
 recoveryLocations.add(storages[i]);
 storageIdx.add(i);
   }
 }
 ...
 // If we only get 1 replica after eliminating stale nodes, choose all
 // replicas for recovery and let the primary data node handle failures.
 DatanodeInfo[] recoveryInfos;
 if (recoveryLocations.size() > 1) {
   if (recoveryLocations.size() != storages.length) {
 LOG.info("Skipped stale nodes for recovery : "
 + (storages.length - recoveryLocations.size()));
   }
   recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(recoveryLocations);
 } else {
   // If too many replicas are stale, then choose all replicas to
   // participate in block recovery.
   recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(storages);
 }
{code}


{code:java}
2024-03-13 21:58:01,425 INFO  datanode.DataNode 
(BlockRecoveryWorker.java:logRecoverBlock(563))
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
BlockRecoveryWorker: NameNode at xxx:8040 calls 
recoverBlock(BP-xxx:blk_xxx_xxx, 
targets=[DatanodeInfoWithStorage[dn1:50010,null,null], 
DatanodeInfoWithStorage[dn2:50010,null,null]], newGenerationStamp=28577373754, 
newBlock=null, isStriped=false)
{code}

*t4.* When dn2 executes BlockRecoveryWorker#recover, it will call 
initReplicaRecovery operation on dn1, however, since the dn1 machine is 
currently down state at this time, it will take a very long time to timeout,  
the default number of retries to establish a server connection is 45 times.
related logs:

{code:java}
2024-03-13 21:59:31,518 INFO  ipc.Client 
(Client.java:handleConnectionTimeout(904)) 
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
Retrying connect to server: dn1:8010. Already tried 0 time(s); maxRetries=45
...
2024-03-13 23:05:35,295 INFO  ipc.Client 
(Client.java:handleConnectionTimeout(904)) 
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
Retrying connect to server: dn2:8010. Already tried 44 time(s); maxRetries=45

2024-03-13 23:07:05,392 WARN  protocol.InterDatanodeProtocol 
(BlockRecoveryWorker.java:recover(170)) 
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
Failed to recover block (block=BP-xxx:blk_xxx_xxx, 
datanode=DatanodeInfoWithStorage[dn1:50010,null,null]) 
org.apache.hadoop.net.ConnectTimeoutException:
Call From dn2 to dn1:8010 failed on socket timeout exception: 
org.apache.hadoop.net.ConnectTimeoutException: 9 millis timeout while 
waiting for channel to be ready for connect.ch : 
java.nio.channels.SocketChannel[connection-pending remote=dn:8010]; For more 
details see:  http://wiki.apache.org/hadoop/SocketTimeout
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:931)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:866)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1583)
at org.apache.hadoop.ipc.Client.call(Client.java:1511)
at org.apache.hadoop.ipc.Client.call(Client.java:1402)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:268)
at 

[jira] [Updated] (HDFS-17430) RecoveringBlock will skip no live replicas when get block recovery command.

2024-03-18 Thread Haiyang Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haiyang Hu updated HDFS-17430:
--
Description: 
RecoveringBlock maybe skip no live replicas when get block recovery command.

*Issue:*
Currently the following scenarios may lead to failure in the execution of 
BlockRecoveryWorker by the datanode, resulting file being not to be closed for 
a long time.

*t1.*  The block_xxx_xxx has two replicas[dn1,dn2]; the dn1 machine shut down 
and will be dead status, the dn2 is live status.
*t2. * Occurs block recovery.
related logs:

{code:java}
2024-03-13 21:58:00.651 WARN hdfs.StateChange   DIR* 
NameSystem.internalReleaseLease: File /xxx/file has not been closed. Lease 
recovery is in progress. RecoveryId = 28577373754 for block blk_xxx_xxx
{code}

*t3. * The dn2 is chosen for block recovery.
dn1 is marked as stale (is dead state) at this time, here the recoveryLocations 
size is 1, currently according to the following logic, dn1 and dn2 will be 
chosen to participate in block recovery.

DatanodeManager#getBlockRecoveryCommand
{code:java}
   // Skip stale nodes during recovery
 final List recoveryLocations =
 new ArrayList<>(storages.length);
 final List storageIdx = new ArrayList<>(storages.length);
 for (int i = 0; i < storages.length; ++i) {
   if (!storages[i].getDatanodeDescriptor().isStale(staleInterval)) {
 recoveryLocations.add(storages[i]);
 storageIdx.add(i);
   }
 }
 ...
 // If we only get 1 replica after eliminating stale nodes, choose all
 // replicas for recovery and let the primary data node handle failures.
 DatanodeInfo[] recoveryInfos;
 if (recoveryLocations.size() > 1) {
   if (recoveryLocations.size() != storages.length) {
 LOG.info("Skipped stale nodes for recovery : "
 + (storages.length - recoveryLocations.size()));
   }
   recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(recoveryLocations);
 } else {
   // If too many replicas are stale, then choose all replicas to
   // participate in block recovery.
   recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(storages);
 }
{code}


{code:java}
2024-03-13 21:58:01,425 INFO  datanode.DataNode 
(BlockRecoveryWorker.java:logRecoverBlock(563))
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
BlockRecoveryWorker: NameNode at xxx:8040 calls 
recoverBlock(BP-xxx:blk_xxx_xxx, 
targets=[DatanodeInfoWithStorage[dn1:50010,null,null], 
DatanodeInfoWithStorage[dn2:50010,null,null]], newGenerationStamp=28577373754, 
newBlock=null, isStriped=false)
{code}

*t.4. *When dn2 executes BlockRecoveryWorker#recover, it will call 
initReplicaRecovery operation on dn1, however, since the dn1 machine is 
currently down state at this time, it will take a very long time to timeout,  
the default number of retries to establish a server connection is 45 times.
related logs:

{code:java}
2024-03-13 21:59:31,518 INFO  ipc.Client 
(Client.java:handleConnectionTimeout(904)) 
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
Retrying connect to server: dn1:8010. Already tried 0 time(s); maxRetries=45
...
2024-03-13 23:05:35,295 INFO  ipc.Client 
(Client.java:handleConnectionTimeout(904)) 
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
Retrying connect to server: dn2:8010. Already tried 44 time(s); maxRetries=45

2024-03-13 23:07:05,392 WARN  protocol.InterDatanodeProtocol 
(BlockRecoveryWorker.java:recover(170)) 
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
Failed to recover block (block=BP-xxx:blk_xxx_xxx, 
datanode=DatanodeInfoWithStorage[dn1:50010,null,null]) 
org.apache.hadoop.net.ConnectTimeoutException:
Call From dn2 to dn1:8010 failed on socket timeout exception: 
org.apache.hadoop.net.ConnectTimeoutException: 9 millis timeout while 
waiting for channel to be ready for connect.ch : 
java.nio.channels.SocketChannel[connection-pending remote=dn:8010]; For more 
details see:  http://wiki.apache.org/hadoop/SocketTimeout
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:931)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:866)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1583)
at org.apache.hadoop.ipc.Client.call(Client.java:1511)
at org.apache.hadoop.ipc.Client.call(Client.java:1402)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:268)
at 

[jira] [Updated] (HDFS-17430) RecoveringBlock will skip no live replicas when get block recovery command.

2024-03-18 Thread Haiyang Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haiyang Hu updated HDFS-17430:
--
Description: 
RecoveringBlock maybe skip no live replicas when get block recovery command.

*Issue:*
Currently the following scenarios may lead to failure in the execution of 
BlockRecoveryWorker by the datanode, resulting file being not to be closed for 
a long time.

t1.  The block_xxx_xxx has two replicas[dn1,dn2]; the dn1 machine shut down and 
will be dead status, the dn2 is live status.
t2.  Occurs block recovery.
related logs:

{code:java}
2024-03-13 21:58:00.651 WARN hdfs.StateChange   DIR* 
NameSystem.internalReleaseLease: File /xxx/file has not been closed. Lease 
recovery is in progress. RecoveryId = 28577373754 for block blk_xxx_xxx
{code}

t3.  The dn2 is chosen for block recovery.
dn1 is marked as stale (is dead state) at this time, here the recoveryLocations 
size is 1, currently according to the following logic, dn1 and dn2 will be 
chosen to participate in block recovery.

DatanodeManager#getBlockRecoveryCommand
{code:java}
   // Skip stale nodes during recovery
 final List recoveryLocations =
 new ArrayList<>(storages.length);
 final List storageIdx = new ArrayList<>(storages.length);
 for (int i = 0; i < storages.length; ++i) {
   if (!storages[i].getDatanodeDescriptor().isStale(staleInterval)) {
 recoveryLocations.add(storages[i]);
 storageIdx.add(i);
   }
 }
 ...
 // If we only get 1 replica after eliminating stale nodes, choose all
 // replicas for recovery and let the primary data node handle failures.
 DatanodeInfo[] recoveryInfos;
 if (recoveryLocations.size() > 1) {
   if (recoveryLocations.size() != storages.length) {
 LOG.info("Skipped stale nodes for recovery : "
 + (storages.length - recoveryLocations.size()));
   }
   recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(recoveryLocations);
 } else {
   // If too many replicas are stale, then choose all replicas to
   // participate in block recovery.
   recoveryInfos = DatanodeStorageInfo.toDatanodeInfos(storages);
 }
{code}


{code:java}
2024-03-13 21:58:01,425 INFO  datanode.DataNode 
(BlockRecoveryWorker.java:logRecoverBlock(563))
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
BlockRecoveryWorker: NameNode at xxx:8040 calls 
recoverBlock(BP-xxx:blk_xxx_xxx, 
targets=[DatanodeInfoWithStorage[dn1:50010,null,null], 
DatanodeInfoWithStorage[dn2:50010,null,null]], newGenerationStamp=28577373754, 
newBlock=null, isStriped=false)
{code}

t.4. When dn2 executes BlockRecoveryWorker#recover, it will call 
initReplicaRecovery operation on dn1, however, since the dn1 machine is 
currently down state at this time, it will take a very long time to timeout,  
the default number of retries to establish a server connection is 45 times.
related logs:

{code:java}
2024-03-13 21:59:31,518 INFO  ipc.Client 
(Client.java:handleConnectionTimeout(904)) 
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
Retrying connect to server: dn1:8010. Already tried 0 time(s); maxRetries=45
...
2024-03-13 23:05:35,295 INFO  ipc.Client 
(Client.java:handleConnectionTimeout(904)) 
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] - 
Retrying connect to server: dn2:8010. Already tried 44 time(s); maxRetries=45

2024-03-13 23:07:05,392 WARN  protocol.InterDatanodeProtocol 
(BlockRecoveryWorker.java:recover(170)) 
[org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1@54e291ac] -
Failed to recover block (block=BP-xxx:blk_xxx_xxx, 
datanode=DatanodeInfoWithStorage[dn1:50010,null,null]) 
org.apache.hadoop.net.ConnectTimeoutException:
Call From dn2 to dn1:8010 failed on socket timeout exception: 
org.apache.hadoop.net.ConnectTimeoutException: 9 millis timeout while 
waiting for channel to be ready for connect.ch : 
java.nio.channels.SocketChannel[connection-pending remote=dn:8010]; For more 
details see:  http://wiki.apache.org/hadoop/SocketTimeout
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:931)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:866)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1583)
at org.apache.hadoop.ipc.Client.call(Client.java:1511)
at org.apache.hadoop.ipc.Client.call(Client.java:1402)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:268)
at 

[jira] [Created] (HDFS-17430) RecoveringBlock will skip no live replicas when get block recovery command.

2024-03-18 Thread Haiyang Hu (Jira)
Haiyang Hu created HDFS-17430:
-

 Summary: RecoveringBlock will skip no live replicas when get block 
recovery command.
 Key: HDFS-17430
 URL: https://issues.apache.org/jira/browse/HDFS-17430
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haiyang Hu
Assignee: Haiyang Hu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17429) Datatransfer sender.java LOG variable uses interface's, causing log fileName mistake

2024-03-18 Thread Zhongkun Wu (Jira)
Zhongkun Wu created HDFS-17429:
--

 Summary: Datatransfer sender.java LOG variable uses interface's, 
causing log fileName mistake
 Key: HDFS-17429
 URL: https://issues.apache.org/jira/browse/HDFS-17429
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Zhongkun Wu


2024-03-18 16:34:40,274 TRACE datatransfer.DataTransferProtocol: :80 Sending 
DataTransferOp OpReadBlockProto: header {

 

the log message above shows that it is in DataTransferProtocol, actually it is 
not!!

it is in the Sender.java which in the same package, implementing 
DataTransferProtocol



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17410) [FGL] Client RPCs that changes file attributes supports fine-grained lock

2024-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827886#comment-17827886
 ] 

ASF GitHub Bot commented on HDFS-17410:
---

ZanderXu opened a new pull request, #6634:
URL: https://github.com/apache/hadoop/pull/6634

   There are some client RPCs are used to change file attributes.
   
   This ticket is used to make these RPCs supporting fine-grained lock.
   
   - setReplication
   - getStoragePolicies
   - setStoragePolicy
   - unsetStoragePolicy
   - satisfyStoragePolicy
   - getStoragePolicy
   - setPermission
   - setOwner
   - setTimes
   - concat
   - truncate
   - setQuota
   - getQuotaUsage
   - modifyAclEntries
   - removeAclEntries
   - removeDefaultAcl
   - removeAcl
   - setAcl
   - getAclStatus
   - getEZForPath
   - listEncryptionZones
   - reencryptEncryptionZone
   - listReencryptionStatus
   - setXAttr
   - getXAttrs
   - listXAttrs
   - removeXAttr




> [FGL] Client RPCs that changes file attributes supports fine-grained lock
> -
>
> Key: HDFS-17410
> URL: https://issues.apache.org/jira/browse/HDFS-17410
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>
> There are some client RPCs are used to change file attributes.
> This ticket is used to make these RPCs supporting fine-grained lock.
>  * setReplication
>  * getStoragePolicies
>  * setStoragePolicy
>  * unsetStoragePolicy
>  * satisfyStoragePolicy
>  * getStoragePolicy
>  * setPermission
>  * setOwner
>  * setTimes
>  * concat
>  * truncate
>  * setQuota
>  * getQuotaUsage
>  * modifyAclEntries
>  * removeAclEntries
>  * removeDefaultAcl
>  * removeAcl
>  * setAcl
>  * getAclStatus
>  * getEZForPath
>  * listEncryptionZones
>  * reencryptEncryptionZone
>  * listReencryptionStatus
>  * setXAttr
>  * getXAttrs
>  * listXAttrs
>  * removeXAttr



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17410) [FGL] Client RPCs that changes file attributes supports fine-grained lock

2024-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17410:
--
Labels: pull-request-available  (was: )

> [FGL] Client RPCs that changes file attributes supports fine-grained lock
> -
>
> Key: HDFS-17410
> URL: https://issues.apache.org/jira/browse/HDFS-17410
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> There are some client RPCs are used to change file attributes.
> This ticket is used to make these RPCs supporting fine-grained lock.
>  * setReplication
>  * getStoragePolicies
>  * setStoragePolicy
>  * unsetStoragePolicy
>  * satisfyStoragePolicy
>  * getStoragePolicy
>  * setPermission
>  * setOwner
>  * setTimes
>  * concat
>  * truncate
>  * setQuota
>  * getQuotaUsage
>  * modifyAclEntries
>  * removeAclEntries
>  * removeDefaultAcl
>  * removeAcl
>  * setAcl
>  * getAclStatus
>  * getEZForPath
>  * listEncryptionZones
>  * reencryptEncryptionZone
>  * listReencryptionStatus
>  * setXAttr
>  * getXAttrs
>  * listXAttrs
>  * removeXAttr



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17428) [FGL] StoragePolicySatisfyManager supports fine-grained lock

2024-03-18 Thread ZanderXu (Jira)
ZanderXu created HDFS-17428:
---

 Summary: [FGL] StoragePolicySatisfyManager supports fine-grained 
lock
 Key: HDFS-17428
 URL: https://issues.apache.org/jira/browse/HDFS-17428
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: ZanderXu
Assignee: ZanderXu


StoragePolicySatisfyManager supports fine-grained lock



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17389) [FGL] Client RPCs involving read process supports fine-grained lock

2024-03-18 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17389:

Description: 
The client read process involves many client RPCs. 

 

This ticket is used to make these RPCs support fine-grained lock.
 * getListing
 * getBatchedListing
 * listOpenFiles
 * getFileInfo
 * isFileClosed
 * getBlockLocations
 * reportBadBlocks
 * getServerDefaults
 * getStats
 * getReplicatedBlockStats
 * getECBlockGroupStats
 * getPreferredBlockSize
 * listCorruptFileBlocks
 * getContentSummary
 * getLocatedFileInfo
 * createEncryptionZone
 * msync
 * checkAccess
 * getFileLinkInfo
 * getLinkTarget

  was:
The client read process involves many client RPCs. 

 

This ticket is used to make these RPCs support fine-grained lock.
 * getListing
 * getBatchedListing
 * listOpenFiles
 * getFileInfo
 * isFileClosed
 * getBlockLocations
 * reportBadBlocks
 * getServerDefaults
 * getStats
 * getReplicatedBlockStats
 * getECBlockGroupStats
 * getPreferredBlockSize
 * listCorruptFileBlocks
 * getContentSummary
 * getLocatedFileInfo
 * createEncryptionZone
 * msync
 * checkAccess
 * getFileLinkInfo
 * getLinkTarget
 * getDelegationToken
 * getDataEncryptionKey


> [FGL] Client RPCs involving read process supports fine-grained lock
> ---
>
> Key: HDFS-17389
> URL: https://issues.apache.org/jira/browse/HDFS-17389
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> The client read process involves many client RPCs. 
>  
> This ticket is used to make these RPCs support fine-grained lock.
>  * getListing
>  * getBatchedListing
>  * listOpenFiles
>  * getFileInfo
>  * isFileClosed
>  * getBlockLocations
>  * reportBadBlocks
>  * getServerDefaults
>  * getStats
>  * getReplicatedBlockStats
>  * getECBlockGroupStats
>  * getPreferredBlockSize
>  * listCorruptFileBlocks
>  * getContentSummary
>  * getLocatedFileInfo
>  * createEncryptionZone
>  * msync
>  * checkAccess
>  * getFileLinkInfo
>  * getLinkTarget



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2024-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827848#comment-17827848
 ] 

ASF GitHub Bot commented on HDFS-17299:
---

hadoop-yetus commented on PR #6614:
URL: https://github.com/apache/hadoop/pull/6614#issuecomment-2003013056

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 47s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 4 new or modified test files.  |
    _ branch-2.10 Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 27s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  13m  5s |  |  branch-2.10 passed  |
   | +1 :green_heart: |  compile  |   2m 11s |  |  branch-2.10 passed with JDK 
Azul Systems, Inc.-1.7.0_262-b10  |
   | +1 :green_heart: |  compile  |   1m 48s |  |  branch-2.10 passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~18.04-b09  |
   | +1 :green_heart: |  checkstyle  |   0m 46s |  |  branch-2.10 passed  |
   | +1 :green_heart: |  mvnsite  |   1m 47s |  |  branch-2.10 passed  |
   | +1 :green_heart: |  javadoc  |   1m 56s |  |  branch-2.10 passed with JDK 
Azul Systems, Inc.-1.7.0_262-b10  |
   | +1 :green_heart: |  javadoc  |   1m 20s |  |  branch-2.10 passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~18.04-b09  |
   | -1 :x: |  spotbugs  |   2m 44s | 
[/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6614/9/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html)
 |  hadoop-hdfs-project/hadoop-hdfs in branch-2.10 has 1 extant spotbugs 
warnings.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 25s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 33s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m  7s |  |  the patch passed with JDK 
Azul Systems, Inc.-1.7.0_262-b10  |
   | +1 :green_heart: |  javac  |   2m  7s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 44s |  |  the patch passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~18.04-b09  |
   | +1 :green_heart: |  javac  |   1m 44s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 40s |  |  hadoop-hdfs-project: The 
patch generated 0 new + 283 unchanged - 2 fixed = 283 total (was 285)  |
   | +1 :green_heart: |  mvnsite  |   1m 37s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 46s |  |  the patch passed with JDK 
Azul Systems, Inc.-1.7.0_262-b10  |
   | +1 :green_heart: |  javadoc  |   1m 12s |  |  the patch passed with JDK 
Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~18.04-b09  |
   | +1 :green_heart: |  spotbugs  |   4m 44s |  |  the patch passed  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 28s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  |  97m  4s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6614/9/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 36s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 153m 25s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestLeaseRecovery2 |
   |   | hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys |
   |   | hadoop.hdfs.TestFileLengthOnClusterRestart |
   |   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion |
   |   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6614/9/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6614 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 4fef73a925b4 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision |