date:20230216

[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690152#comment-17690152
 ] 

ASF GitHub Bot commented on HDFS-16896:
---

hadoop-yetus commented on PR #5322:
URL: https://github.com/apache/hadoop/pull/5322#issuecomment-1434122848

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 50s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m 18s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  33m 57s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   7m  0s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  compile  |   6m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 26s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 35s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 58s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   2m 17s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   6m 34s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  28m 32s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 43s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 25s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 53s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javac  |   5m 53s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 41s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   5m 41s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  6s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/8/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 1 new + 42 unchanged - 0 fixed = 
43 total (was 42)  |
   | +1 :green_heart: |  mvnsite  |   2m 15s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 28s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   2m  0s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   5m 50s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m 20s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 26s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 204m 16s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/8/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 51s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 366m 39s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.TestAuditLogs |
   |   | hadoop.hdfs.server.namenode.TestFsck |
   |   | hadoop.hdfs.server.namenode.TestAuditLogger |
   |   | hadoop.hdfs.server.namenode.TestFSNamesystemLockReport |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.42 ServerAPI=1.42 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/8/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5322 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux b5a1be0c0e37 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 
18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bi

[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690148#comment-17690148
 ] 

ASF GitHub Bot commented on HDFS-16896:
---

hadoop-yetus commented on PR #5322:
URL: https://github.com/apache/hadoop/pull/5322#issuecomment-1434115847

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 43s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m 31s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  31m 12s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   6m 14s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  compile  |   5m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 21s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 30s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 49s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   2m 18s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   5m 57s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m 50s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 47s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 54s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javac  |   5m 54s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 37s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   5m 37s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  4s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/7/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 2 new + 42 unchanged - 0 fixed = 
44 total (was 42)  |
   | +1 :green_heart: |  mvnsite  |   2m 14s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 26s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   2m  1s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   5m 49s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m 46s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 29s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 207m 40s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/7/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 51s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 361m 50s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.TestAuditLogs |
   |   | hadoop.hdfs.server.namenode.TestAuditLogger |
   |   | hadoop.hdfs.server.namenode.TestFSNamesystemLockReport |
   |   | hadoop.hdfs.server.namenode.TestFsck |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.42 ServerAPI=1.42 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/7/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5322 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 41ce24a7c7bf 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 
18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bi

[jira] [Updated] (HDFS-16925) Namenode audit log to only include IP address of client

2023-02-16 Thread Viraj Jasani (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani updated HDFS-16925:

Summary: Namenode audit log to only include IP address of client  (was: Fix 
regex pattern for namenode audit log tests)

> Namenode audit log to only include IP address of client
> ---
>
> Key: HDFS-16925
> URL: https://issues.apache.org/jira/browse/HDFS-16925
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> With HADOOP-18628 in place, we perform InetAddress#getHostName in addition to 
> InetAddress#getHostAddress, to save host name with IPC Connection object. 
> When we perform InetAddress#getHostName, toString() of InetAddress would 
> automatically print \{hostName}/\{hostIPAddress} if hostname is already 
> resolved:
> {code:java}
> /**
>  * Converts this IP address to a {@code String}. The
>  * string returned is of the form: hostname / literal IP
>  * address.
>  *
>  * If the host name is unresolved, no reverse name service lookup
>  * is performed. The hostname part will be represented by an empty string.
>  *
>  * @return  a string representation of this IP address.
>  */
> public String toString() {
> String hostName = holder().getHostName();
> return ((hostName != null) ? hostName : "")
> + "/" + getHostAddress();
> }{code}
>  
> For namenode audit logs, this means that when dfs client makes filesystem 
> updates, the audit logs would also print host name in the audit logs in 
> addition to ip address. We have some tests that performs regex pattern 
> matching to identify the log pattern of audit logs, we will have to change 
> them to reflect the change in host address.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16925) Fix regex pattern for namenode audit log tests

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690114#comment-17690114
 ] 

ASF GitHub Bot commented on HDFS-16925:
---

hadoop-yetus commented on PR #5407:
URL: https://github.com/apache/hadoop/pull/5407#issuecomment-1434043812

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 44s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 4 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  50m  6s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m  9s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 30s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 30s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  26m 12s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javac  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 13s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 53s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 51s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   1m 25s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 16s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m 30s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 206m 29s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5407/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 50s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 331m 13s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.TestNamenodeRetryCache |
   |   | hadoop.hdfs.server.namenode.TestFSNamesystemLockReport |
   |   | hadoop.hdfs.server.namenode.TestAuditLoggerWithCommands |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.42 ServerAPI=1.42 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5407/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5407 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux d143c4098f32 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 
18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / d5510935e82f3353c374a3fd4024c3746574105c |
   | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5407/2/testReport/ |
   | Max. process+thread count | 2982 (vs. ulimit of 5500) |
   | modules | C: ha

[jira] [Commented] (HDFS-16925) Fix regex pattern for namenode audit log tests

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690106#comment-17690106
 ] 

ASF GitHub Bot commented on HDFS-16925:
---

virajjasani commented on PR #5407:
URL: https://github.com/apache/hadoop/pull/5407#issuecomment-1434027211

   Apologies. 
   The above IS.Stable would only mean that method arguments do not change, it 
doesn't matter what we do as part of the logic.
   
   Please ignore above comment about making the method IS.Stable.




> Fix regex pattern for namenode audit log tests
> --
>
> Key: HDFS-16925
> URL: https://issues.apache.org/jira/browse/HDFS-16925
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> With HADOOP-18628 in place, we perform InetAddress#getHostName in addition to 
> InetAddress#getHostAddress, to save host name with IPC Connection object. 
> When we perform InetAddress#getHostName, toString() of InetAddress would 
> automatically print \{hostName}/\{hostIPAddress} if hostname is already 
> resolved:
> {code:java}
> /**
>  * Converts this IP address to a {@code String}. The
>  * string returned is of the form: hostname / literal IP
>  * address.
>  *
>  * If the host name is unresolved, no reverse name service lookup
>  * is performed. The hostname part will be represented by an empty string.
>  *
>  * @return  a string representation of this IP address.
>  */
> public String toString() {
> String hostName = holder().getHostName();
> return ((hostName != null) ? hostName : "")
> + "/" + getHostAddress();
> }{code}
>  
> For namenode audit logs, this means that when dfs client makes filesystem 
> updates, the audit logs would also print host name in the audit logs in 
> addition to ip address. We have some tests that performs regex pattern 
> matching to identify the log pattern of audit logs, we will have to change 
> them to reflect the change in host address.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16917) Add transfer rate quantile metrics for DataNode reads

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690105#comment-17690105
 ] 

ASF GitHub Bot commented on HDFS-16917:
---

hadoop-yetus commented on PR #5397:
URL: https://github.com/apache/hadoop/pull/5397#issuecomment-1434026427

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 40s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m 30s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  30m 54s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  23m 17s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  compile  |  20m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   3m 48s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 28s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   2m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   6m  8s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  26m 26s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 29s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 36s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javac  |  22m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 34s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |  20m 34s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   3m 41s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   3m 29s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m 20s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   2m 42s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   6m 20s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m 42s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  18m 16s |  |  hadoop-common in the patch 
passed.  |
   | -1 :x: |  unit  | 204m 35s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5397/10/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 10s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 451m 15s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.TestAuditLogger |
   |   | hadoop.hdfs.server.namenode.TestFsck |
   |   | hadoop.hdfs.server.namenode.TestAuditLogs |
   |   | hadoop.hdfs.server.namenode.TestFSNamesystemLockReport |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.42 ServerAPI=1.42 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5397/10/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5397 |
   | Optional Tests | dupname asflicense mvnsite codespell detsecrets 
markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs 
checkstyle |
   | uname | Linux 873e7c4ff4e0 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 
18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / b3267f35f78f0625039a10d1c970e1f05cf677cd |
   | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b0

[jira] [Commented] (HDFS-16925) Fix regex pattern for namenode audit log tests

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690101#comment-17690101
 ] 

ASF GitHub Bot commented on HDFS-16925:
---

virajjasani commented on code in PR #5407:
URL: https://github.com/apache/hadoop/pull/5407#discussion_r1109234913


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java:
##
@@ -8787,7 +8787,10 @@ public void logAuditEvent(boolean succeeded, String 
userName,
 sb.setLength(0);
 sb.append("allowed=").append(succeeded).append("\t")
 .append("ugi=").append(userName).append("\t")
-.append("ip=").append(addr).append("\t")
+// InetAddress#tostring can also return hostname in addition to IP 
address.
+// To not include hostname regardless of the hostname resolution, 
we should
+// only include IP address here. See HADOOP-18628.

Review Comment:
   Done





> Fix regex pattern for namenode audit log tests
> --
>
> Key: HDFS-16925
> URL: https://issues.apache.org/jira/browse/HDFS-16925
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> With HADOOP-18628 in place, we perform InetAddress#getHostName in addition to 
> InetAddress#getHostAddress, to save host name with IPC Connection object. 
> When we perform InetAddress#getHostName, toString() of InetAddress would 
> automatically print \{hostName}/\{hostIPAddress} if hostname is already 
> resolved:
> {code:java}
> /**
>  * Converts this IP address to a {@code String}. The
>  * string returned is of the form: hostname / literal IP
>  * address.
>  *
>  * If the host name is unresolved, no reverse name service lookup
>  * is performed. The hostname part will be represented by an empty string.
>  *
>  * @return  a string representation of this IP address.
>  */
> public String toString() {
> String hostName = holder().getHostName();
> return ((hostName != null) ? hostName : "")
> + "/" + getHostAddress();
> }{code}
>  
> For namenode audit logs, this means that when dfs client makes filesystem 
> updates, the audit logs would also print host name in the audit logs in 
> addition to ip address. We have some tests that performs regex pattern 
> matching to identify the log pattern of audit logs, we will have to change 
> them to reflect the change in host address.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16925) Fix regex pattern for namenode audit log tests

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690100#comment-17690100
 ] 

ASF GitHub Bot commented on HDFS-16925:
---

virajjasani commented on PR #5407:
URL: https://github.com/apache/hadoop/pull/5407#issuecomment-1434019817

   We have this currently and I looked at this first for the first commit on 
this PR:
   
   ```
   /**
* Interface defining an audit logger.
*/
   @InterfaceAudience.Public
   @InterfaceStability.Evolving
   public interface AuditLogger {
   ...
   ...
   ```
   
   and
   
   ```
   /**
* Extension of {@link AuditLogger}.
*/
   @InterfaceAudience.Public
   @InterfaceStability.Evolving
   public abstract class HdfsAuditLogger implements AuditLogger {
   ...
   ```
   
   Looks like they should rather be IS.Stable or at least this method should be 
IS.Stable, WDYT?
   ```
 public abstract void logAuditEvent(boolean succeeded, String userName,
 InetAddress addr, String cmd, String src, String dst,
 FileStatus stat, CallerContext callerContext, UserGroupInformation ugi,
 DelegationTokenSecretManager dtSecretManager);
   ...
   ...
   ```




> Fix regex pattern for namenode audit log tests
> --
>
> Key: HDFS-16925
> URL: https://issues.apache.org/jira/browse/HDFS-16925
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> With HADOOP-18628 in place, we perform InetAddress#getHostName in addition to 
> InetAddress#getHostAddress, to save host name with IPC Connection object. 
> When we perform InetAddress#getHostName, toString() of InetAddress would 
> automatically print \{hostName}/\{hostIPAddress} if hostname is already 
> resolved:
> {code:java}
> /**
>  * Converts this IP address to a {@code String}. The
>  * string returned is of the form: hostname / literal IP
>  * address.
>  *
>  * If the host name is unresolved, no reverse name service lookup
>  * is performed. The hostname part will be represented by an empty string.
>  *
>  * @return  a string representation of this IP address.
>  */
> public String toString() {
> String hostName = holder().getHostName();
> return ((hostName != null) ? hostName : "")
> + "/" + getHostAddress();
> }{code}
>  
> For namenode audit logs, this means that when dfs client makes filesystem 
> updates, the audit logs would also print host name in the audit logs in 
> addition to ip address. We have some tests that performs regex pattern 
> matching to identify the log pattern of audit logs, we will have to change 
> them to reflect the change in host address.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16925) Fix regex pattern for namenode audit log tests

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690098#comment-17690098
 ] 

ASF GitHub Bot commented on HDFS-16925:
---

virajjasani commented on PR #5407:
URL: https://github.com/apache/hadoop/pull/5407#issuecomment-1434017468

   > Sorry, I didn't think #5385 changes the audit log format. We have to keep 
it as it was before. So it shouldn't print hostname in audit logs. The last 
change looks good to me if CI is ok.
   
   I was not aware of the compatibility guidelines for audit logs until Ayush 
mentioned the link.
   
   @ayushtkn @tasanuma do you think it makes sense to keep the method that 
produces the final string that we print in the audit log to be marked as 
`IA.Public` and `IS.Stable`?




> Fix regex pattern for namenode audit log tests
> --
>
> Key: HDFS-16925
> URL: https://issues.apache.org/jira/browse/HDFS-16925
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> With HADOOP-18628 in place, we perform InetAddress#getHostName in addition to 
> InetAddress#getHostAddress, to save host name with IPC Connection object. 
> When we perform InetAddress#getHostName, toString() of InetAddress would 
> automatically print \{hostName}/\{hostIPAddress} if hostname is already 
> resolved:
> {code:java}
> /**
>  * Converts this IP address to a {@code String}. The
>  * string returned is of the form: hostname / literal IP
>  * address.
>  *
>  * If the host name is unresolved, no reverse name service lookup
>  * is performed. The hostname part will be represented by an empty string.
>  *
>  * @return  a string representation of this IP address.
>  */
> public String toString() {
> String hostName = holder().getHostName();
> return ((hostName != null) ? hostName : "")
> + "/" + getHostAddress();
> }{code}
>  
> For namenode audit logs, this means that when dfs client makes filesystem 
> updates, the audit logs would also print host name in the audit logs in 
> addition to ip address. We have some tests that performs regex pattern 
> matching to identify the log pattern of audit logs, we will have to change 
> them to reflect the change in host address.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16925) Fix regex pattern for namenode audit log tests

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690096#comment-17690096
 ] 

ASF GitHub Bot commented on HDFS-16925:
---

tasanuma commented on code in PR #5407:
URL: https://github.com/apache/hadoop/pull/5407#discussion_r1109228327


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java:
##
@@ -8787,7 +8787,10 @@ public void logAuditEvent(boolean succeeded, String 
userName,
 sb.setLength(0);
 sb.append("allowed=").append(succeeded).append("\t")
 .append("ugi=").append(userName).append("\t")
-.append("ip=").append(addr).append("\t")
+// InetAddress#tostring can also return hostname in addition to IP 
address.
+// To not include hostname regardless of the hostname resolution, 
we should
+// only include IP address here. See HADOOP-18628.

Review Comment:
   I think this change is independent of HADOOP-18628. So we don't need to 
refer to it.
   ```suggestion
   // only include IP address here.
   ```





> Fix regex pattern for namenode audit log tests
> --
>
> Key: HDFS-16925
> URL: https://issues.apache.org/jira/browse/HDFS-16925
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> With HADOOP-18628 in place, we perform InetAddress#getHostName in addition to 
> InetAddress#getHostAddress, to save host name with IPC Connection object. 
> When we perform InetAddress#getHostName, toString() of InetAddress would 
> automatically print \{hostName}/\{hostIPAddress} if hostname is already 
> resolved:
> {code:java}
> /**
>  * Converts this IP address to a {@code String}. The
>  * string returned is of the form: hostname / literal IP
>  * address.
>  *
>  * If the host name is unresolved, no reverse name service lookup
>  * is performed. The hostname part will be represented by an empty string.
>  *
>  * @return  a string representation of this IP address.
>  */
> public String toString() {
> String hostName = holder().getHostName();
> return ((hostName != null) ? hostName : "")
> + "/" + getHostAddress();
> }{code}
>  
> For namenode audit logs, this means that when dfs client makes filesystem 
> updates, the audit logs would also print host name in the audit logs in 
> addition to ip address. We have some tests that performs regex pattern 
> matching to identify the log pattern of audit logs, we will have to change 
> them to reflect the change in host address.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16925) Fix regex pattern for namenode audit log tests

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690095#comment-17690095
 ] 

ASF GitHub Bot commented on HDFS-16925:
---

tasanuma commented on PR #5407:
URL: https://github.com/apache/hadoop/pull/5407#issuecomment-1434014182

   Sorry, I didn't think #5385 changes the audit log format. We have to keep it 
as it was before. So it shouldn't print hostname in audit logs. The last change 
looks good to me if CI is ok.




> Fix regex pattern for namenode audit log tests
> --
>
> Key: HDFS-16925
> URL: https://issues.apache.org/jira/browse/HDFS-16925
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> With HADOOP-18628 in place, we perform InetAddress#getHostName in addition to 
> InetAddress#getHostAddress, to save host name with IPC Connection object. 
> When we perform InetAddress#getHostName, toString() of InetAddress would 
> automatically print \{hostName}/\{hostIPAddress} if hostname is already 
> resolved:
> {code:java}
> /**
>  * Converts this IP address to a {@code String}. The
>  * string returned is of the form: hostname / literal IP
>  * address.
>  *
>  * If the host name is unresolved, no reverse name service lookup
>  * is performed. The hostname part will be represented by an empty string.
>  *
>  * @return  a string representation of this IP address.
>  */
> public String toString() {
> String hostName = holder().getHostName();
> return ((hostName != null) ? hostName : "")
> + "/" + getHostAddress();
> }{code}
>  
> For namenode audit logs, this means that when dfs client makes filesystem 
> updates, the audit logs would also print host name in the audit logs in 
> addition to ip address. We have some tests that performs regex pattern 
> matching to identify the log pattern of audit logs, we will have to change 
> them to reflect the change in host address.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16917) Add transfer rate quantile metrics for DataNode reads

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690081#comment-17690081
 ] 

ASF GitHub Bot commented on HDFS-16917:
---

hadoop-yetus commented on PR #5397:
URL: https://github.com/apache/hadoop/pull/5397#issuecomment-1433979733

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 37s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m 38s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  31m 23s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  23m  9s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  compile  |  20m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   3m 47s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 27s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   2m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   6m  4s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  26m 10s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 28s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 31s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 26s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javac  |  22m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 38s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |  20m 38s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   3m 39s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   3m 26s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m 22s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   2m 40s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   6m 18s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m 23s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  18m 18s |  |  hadoop-common in the patch 
passed.  |
   | -1 :x: |  unit  | 205m 38s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5397/9/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 14s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 451m 41s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.TestAuditLogs |
   |   | hadoop.hdfs.server.namenode.TestFSNamesystemLockReport |
   |   | hadoop.hdfs.server.namenode.TestFsck |
   |   | hadoop.hdfs.server.namenode.TestAuditLogger |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.42 ServerAPI=1.42 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5397/9/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5397 |
   | Optional Tests | dupname asflicense mvnsite codespell detsecrets 
markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs 
checkstyle |
   | uname | Linux e20e3c851708 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 
18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 42714178a1e6034a88a612b20972b2e40bc125fd |
   | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08

[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690075#comment-17690075
 ] 

ASF GitHub Bot commented on HDFS-16896:
---

hadoop-yetus commented on PR #5322:
URL: https://github.com/apache/hadoop/pull/5322#issuecomment-1433945193

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 46s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  23m 40s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  33m 28s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   6m  6s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  compile  |   5m 55s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 19s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 30s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 50s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   2m 16s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   5m 53s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m 51s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 28s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   6m  4s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javac  |   6m  4s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 38s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   5m 38s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  6s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/6/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 2 new + 42 unchanged - 0 fixed = 
44 total (was 42)  |
   | +1 :green_heart: |  mvnsite  |   2m 14s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 27s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   1m 57s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   5m 49s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m 37s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 28s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 205m 29s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 51s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 369m 46s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestPread |
   |   | hadoop.hdfs.server.namenode.TestAuditLogs |
   |   | hadoop.hdfs.server.namenode.TestFsck |
   |   | hadoop.hdfs.server.namenode.TestAuditLogger |
   |   | hadoop.hdfs.server.namenode.TestFSNamesystemLockReport |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.42 ServerAPI=1.42 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/6/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5322 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux d9582a898494 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 
18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |

[jira] [Commented] (HDFS-16925) Fix regex pattern for namenode audit log tests

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690028#comment-17690028
 ] 

ASF GitHub Bot commented on HDFS-16925:
---

virajjasani commented on PR #5407:
URL: https://github.com/apache/hadoop/pull/5407#issuecomment-1433760364

   @ayushtkn can you please take a look?




> Fix regex pattern for namenode audit log tests
> --
>
> Key: HDFS-16925
> URL: https://issues.apache.org/jira/browse/HDFS-16925
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> With HADOOP-18628 in place, we perform InetAddress#getHostName in addition to 
> InetAddress#getHostAddress, to save host name with IPC Connection object. 
> When we perform InetAddress#getHostName, toString() of InetAddress would 
> automatically print \{hostName}/\{hostIPAddress} if hostname is already 
> resolved:
> {code:java}
> /**
>  * Converts this IP address to a {@code String}. The
>  * string returned is of the form: hostname / literal IP
>  * address.
>  *
>  * If the host name is unresolved, no reverse name service lookup
>  * is performed. The hostname part will be represented by an empty string.
>  *
>  * @return  a string representation of this IP address.
>  */
> public String toString() {
> String hostName = holder().getHostName();
> return ((hostName != null) ? hostName : "")
> + "/" + getHostAddress();
> }{code}
>  
> For namenode audit logs, this means that when dfs client makes filesystem 
> updates, the audit logs would also print host name in the audit logs in 
> addition to ip address. We have some tests that performs regex pattern 
> matching to identify the log pattern of audit logs, we will have to change 
> them to reflect the change in host address.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16918) Optionally shut down datanode if it does not stay connected to active namenode

2023-02-16 Thread Viraj Jasani (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani resolved HDFS-16918.
-
Resolution: Won't Fix

> Optionally shut down datanode if it does not stay connected to active namenode
> --
>
> Key: HDFS-16918
> URL: https://issues.apache.org/jira/browse/HDFS-16918
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> While deploying Hdfs on Envoy proxy setup, depending on the socket timeout 
> configured at envoy, the network connection issues or packet loss could be 
> observed. All of envoys basically form a transparent communication mesh in 
> which each app can send and receive packets to and from localhost and is 
> unaware of the network topology.
> The primary purpose of Envoy is to make the network transparent to 
> applications, in order to identify network issues reliably. However, 
> sometimes such proxy based setup could result into socket connection issues 
> b/ datanode and namenode.
> Many deployment frameworks provide auto-start functionality when any of the 
> hadoop daemons are stopped. If a given datanode does not stay connected to 
> active namenode in the cluster i.e. does not receive heartbeat response in 
> time from active namenode (even though active namenode is not terminated), it 
> would not be much useful. We should be able to provide configurable behavior 
> such that if a given datanode cannot receive heartbeat response from active 
> namenode in configurable time duration, it should terminate itself to avoid 
> impacting the availability SLA. This is specifically helpful when the 
> underlying deployment or observability framework (e.g. K8S) can start up the 
> datanode automatically upon it's shutdown (unless it is being restarted as 
> part of rolling upgrade) and help the newly brought up datanode (in case of 
> k8s, a new pod with dynamically changing nodes) establish new socket 
> connection to active and standby namenodes. This should be an opt-in behavior 
> and not default one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16918) Optionally shut down datanode if it does not stay connected to active namenode

2023-02-16 Thread Viraj Jasani (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani updated HDFS-16918:

Description: 
While deploying Hdfs on Envoy proxy setup, depending on the socket timeout 
configured at envoy, the network connection issues or packet loss could be 
observed. All of envoys basically form a transparent communication mesh in 
which each app can send and receive packets to and from localhost and is 
unaware of the network topology.

The primary purpose of Envoy is to make the network transparent to 
applications, in order to identify network issues reliably. However, sometimes 
such proxy based setup could result into socket connection issues b/ datanode 
and namenode.

Many deployment frameworks provide auto-start functionality when any of the 
hadoop daemons are stopped. If a given datanode does not stay connected to 
active namenode in the cluster i.e. does not receive heartbeat response in time 
from active namenode (even though active namenode is not terminated), it would 
not be much useful. We should be able to provide configurable behavior such 
that if a given datanode cannot receive heartbeat response from active namenode 
in configurable time duration, it should terminate itself to avoid impacting 
the availability SLA. This is specifically helpful when the underlying 
deployment or observability framework (e.g. K8S) can start up the datanode 
automatically upon it's shutdown (unless it is being restarted as part of 
rolling upgrade) and help the newly brought up datanode (in case of k8s, a new 
pod with dynamically changing nodes) establish new socket connection to active 
and standby namenodes. This should be an opt-in behavior and not default one.

  was:
While deploying Hdfs on Envoy proxy setup, depending on the socket timeout 
configured at envoy, the network connection issues or packet loss could be 
observed. All of envoys basically form a transparent communication mesh in 
which each app can send and receive packets to and from localhost and is 
unaware of the network topology.

The primary purpose of Envoy is to make the network transparent to 
applications, in order to identify network issues reliably. However, sometimes 
such proxy based setup could result into socket connection issues b/ datanode 
and namenode.

Many deployment frameworks provide auto-start functionality when any of the 
hadoop daemons are stopped. If a given datanode does not stay connected to 
active namenode in the cluster i.e. does not receive heartbeat response in time 
from active namenode (even though active namenode is not terminated), it would 
not be much useful. We should be able to provide configurable behavior such 
that if a given datanode cannot receive heartbeat response from active namenode 
in configurable time duration, it should terminate itself to avoid impacting 
the availability SLA. This is specifically helpful when the underlying 
deployment or observability framework (e.g. K8S) can start up the datanode 
automatically upon it's shutdown (unless it is being restarted as part of 
rolling upgrade) and help the newly brought up datanode (in case of k8s, a new 
pod with dynamically changing nodes) establish new socket connection to active 
and standby namenodes. This should be an opt-in behavior and not default one.

 

In a distributed system, it is essential to have robust fail-fast mechanisms in 
place to prevent issues related to network partitioning. The system must be 
designed to prevent further degradation of availability and consistency in the 
event of a network partition. Several distributed systems offer fail-safe 
approaches, and for some, partition tolerance is critical to the extent that 
even a few seconds of heartbeat loss can trigger the removal of an application 
server instance from the cluster. For instance, a majority of zooKeeper clients 
utilize the ephemeral nodes for this purpose to make system reliable, 
fault-tolerant and strongly consistent in the event of network partition.

>From the hdfs architecture viewpoint, it is crucial to understand the critical 
>role that active and observer namenode play in file system operations. In a 
>large-scale cluster, if the datanodes holding the same block (primary and 
>replicas) lose connection to both active and observer namenodes for a 
>significant amount of time, delaying the process of shutting down such 
>datanodes and restarting it to re-establish the connection with the namenodes 
>(assuming the active namenode is alive, assumption is important in the even of 
>network partition to reestablish the connection) will further deteriorate the 
>availability of the service. This scenario underscores the importance of 
>resolving network partitioning.

This is a real use case for hdfs and it is not prudent to assume that every 
deployment or cluster management application must be able to restart datanodes 
based on JM

[jira] [Commented] (HDFS-16918) Optionally shut down datanode if it does not stay connected to active namenode

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17690020#comment-17690020
 ] 

ASF GitHub Bot commented on HDFS-16918:
---

virajjasani closed pull request #5396: HDFS-16918. Optionally shut down 
datanode if it does not stay connected to active namenode
URL: https://github.com/apache/hadoop/pull/5396




> Optionally shut down datanode if it does not stay connected to active namenode
> --
>
> Key: HDFS-16918
> URL: https://issues.apache.org/jira/browse/HDFS-16918
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> While deploying Hdfs on Envoy proxy setup, depending on the socket timeout 
> configured at envoy, the network connection issues or packet loss could be 
> observed. All of envoys basically form a transparent communication mesh in 
> which each app can send and receive packets to and from localhost and is 
> unaware of the network topology.
> The primary purpose of Envoy is to make the network transparent to 
> applications, in order to identify network issues reliably. However, 
> sometimes such proxy based setup could result into socket connection issues 
> b/ datanode and namenode.
> Many deployment frameworks provide auto-start functionality when any of the 
> hadoop daemons are stopped. If a given datanode does not stay connected to 
> active namenode in the cluster i.e. does not receive heartbeat response in 
> time from active namenode (even though active namenode is not terminated), it 
> would not be much useful. We should be able to provide configurable behavior 
> such that if a given datanode cannot receive heartbeat response from active 
> namenode in configurable time duration, it should terminate itself to avoid 
> impacting the availability SLA. This is specifically helpful when the 
> underlying deployment or observability framework (e.g. K8S) can start up the 
> datanode automatically upon it's shutdown (unless it is being restarted as 
> part of rolling upgrade) and help the newly brought up datanode (in case of 
> k8s, a new pod with dynamically changing nodes) establish new socket 
> connection to active and standby namenodes. This should be an opt-in behavior 
> and not default one.
>  
> In a distributed system, it is essential to have robust fail-fast mechanisms 
> in place to prevent issues related to network partitioning. The system must 
> be designed to prevent further degradation of availability and consistency in 
> the event of a network partition. Several distributed systems offer fail-safe 
> approaches, and for some, partition tolerance is critical to the extent that 
> even a few seconds of heartbeat loss can trigger the removal of an 
> application server instance from the cluster. For instance, a majority of 
> zooKeeper clients utilize the ephemeral nodes for this purpose to make system 
> reliable, fault-tolerant and strongly consistent in the event of network 
> partition.
> From the hdfs architecture viewpoint, it is crucial to understand the 
> critical role that active and observer namenode play in file system 
> operations. In a large-scale cluster, if the datanodes holding the same block 
> (primary and replicas) lose connection to both active and observer namenodes 
> for a significant amount of time, delaying the process of shutting down such 
> datanodes and restarting it to re-establish the connection with the namenodes 
> (assuming the active namenode is alive, assumption is important in the even 
> of network partition to reestablish the connection) will further deteriorate 
> the availability of the service. This scenario underscores the importance of 
> resolving network partitioning.
> This is a real use case for hdfs and it is not prudent to assume that every 
> deployment or cluster management application must be able to restart 
> datanodes based on JMX metrics, as this would introduce another application 
> to resolve the network partition impact of hdfs. Besides, popular cluster 
> management applications are not typically used in all cloud-native env. Even 
> if these cluster management applications are deployed, certain security 
> constraints may restrict their access to JMX metrics and prevent them from 
> interfering with hdfs operations. The applications that can only trigger 
> alerts for users based on set parameters (for instance, missing blocks > 0) 
> are allowed to access JMX metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issu

[jira] [Updated] (HDFS-16917) Add transfer rate quantile metrics for DataNode reads

2023-02-16 Thread Ravindra Dingankar (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Dingankar updated HDFS-16917:
--
Description: 
Currently we have the following metrics for datanode reads.
|BytesRead
BlocksRead
TotalReadTime|Total number of bytes read from DataNode
Total number of blocks read from DataNode
Total number of milliseconds spent on read operation|

We would like to add a new quantile metric calculating the transfer rate for 
datanode reads.
This will give us a distribution across a window of the read transfer rate for 
each datanode.

Quantiles for transfer rate per host will help in identifying issues like 
hotspotting of datasets as well as finding repetitive slow datanodes.

 

  was:
Currently we have the following metrics for datanode reads.
|BytesRead
BlocksRead
TotalReadTime|Total number of bytes read from DataNode
Total number of blocks read from DataNode
Total number of milliseconds spent on read operation|

We would like to add a new quantile metric calculating the transfer rate for 
datanode reads.
This will give us a distribution across a window of the read transfer rate for 
each datanode.

Percentiles for transfer rate will help in identifying issues like hotspotting 
of datasets as well as finding repetitive slow datanodes.

 


> Add transfer rate quantile metrics for DataNode reads
> -
>
> Key: HDFS-16917
> URL: https://issues.apache.org/jira/browse/HDFS-16917
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Reporter: Ravindra Dingankar
>Priority: Minor
>  Labels: pull-request-available
>
> Currently we have the following metrics for datanode reads.
> |BytesRead
> BlocksRead
> TotalReadTime|Total number of bytes read from DataNode
> Total number of blocks read from DataNode
> Total number of milliseconds spent on read operation|
> We would like to add a new quantile metric calculating the transfer rate for 
> datanode reads.
> This will give us a distribution across a window of the read transfer rate 
> for each datanode.
> Quantiles for transfer rate per host will help in identifying issues like 
> hotspotting of datasets as well as finding repetitive slow datanodes.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16917) Add transfer rate quantile metrics for DataNode reads

2023-02-16 Thread Ravindra Dingankar (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Dingankar updated HDFS-16917:
--
Description: 
Currently we have the following metrics for datanode reads.
|BytesRead
BlocksRead
TotalReadTime|Total number of bytes read from DataNode
Total number of blocks read from DataNode
Total number of milliseconds spent on read operation|

We would like to add a new quantile metric calculating the transfer rate for 
datanode reads.
This will give us a distribution across a window of the read transfer rate for 
each datanode.

Percentiles for transfer rate will help in identifying issues like hotspotting 
of datasets as well as finding repetitive slow datanodes.

 

  was:
Currently we have the following metrics for datanode reads.
|BytesRead
BlocksRead
TotalReadTime|Total number of bytes read from DataNode
Total number of blocks read from DataNode
Total number of milliseconds spent on read operation|

We would like to add a new quantile metric calculating the data transfer rate 
for datanode reads.
This will give us a distribution across a window of the transfer rate for each 
datanode.

Percentiles for transfer rate will help in identifying repetitive slow 
datanodes or nodes with hotspots.

 


> Add transfer rate quantile metrics for DataNode reads
> -
>
> Key: HDFS-16917
> URL: https://issues.apache.org/jira/browse/HDFS-16917
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Reporter: Ravindra Dingankar
>Priority: Minor
>  Labels: pull-request-available
>
> Currently we have the following metrics for datanode reads.
> |BytesRead
> BlocksRead
> TotalReadTime|Total number of bytes read from DataNode
> Total number of blocks read from DataNode
> Total number of milliseconds spent on read operation|
> We would like to add a new quantile metric calculating the transfer rate for 
> datanode reads.
> This will give us a distribution across a window of the read transfer rate 
> for each datanode.
> Percentiles for transfer rate will help in identifying issues like 
> hotspotting of datasets as well as finding repetitive slow datanodes.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16917) Add transfer rate quantile metrics for DataNode reads

2023-02-16 Thread Ravindra Dingankar (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Dingankar updated HDFS-16917:
--
Description: 
Currently we have the following metrics for datanode reads.
|BytesRead
BlocksRead
TotalReadTime|Total number of bytes read from DataNode
Total number of blocks read from DataNode
Total number of milliseconds spent on read operation|

We would like to add a new quantile metric calculating the data transfer rate 
for datanode reads.
This will give us a distribution across a window of the transfer rate for each 
datanode.

Percentiles for transfer rate will help in identifying repetitive slow 
datanodes or nodes with hotspots.

 

  was:
Currently we have the following metrics for datanode reads.
|BytesRead
BlocksRead
TotalReadTime|Total number of bytes read from DataNode
Total number of blocks read from DataNode
Total number of milliseconds spent on read operation|

We would like to add a new quantile metric calculating the distribution of data 
transfer rate for datanode reads.

 


> Add transfer rate quantile metrics for DataNode reads
> -
>
> Key: HDFS-16917
> URL: https://issues.apache.org/jira/browse/HDFS-16917
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Reporter: Ravindra Dingankar
>Priority: Minor
>  Labels: pull-request-available
>
> Currently we have the following metrics for datanode reads.
> |BytesRead
> BlocksRead
> TotalReadTime|Total number of bytes read from DataNode
> Total number of blocks read from DataNode
> Total number of milliseconds spent on read operation|
> We would like to add a new quantile metric calculating the data transfer rate 
> for datanode reads.
> This will give us a distribution across a window of the transfer rate for 
> each datanode.
> Percentiles for transfer rate will help in identifying repetitive slow 
> datanodes or nodes with hotspots.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16918) Optionally shut down datanode if it does not stay connected to active namenode

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689990#comment-17689990
 ] 

ASF GitHub Bot commented on HDFS-16918:
---

virajjasani commented on PR #5396:
URL: https://github.com/apache/hadoop/pull/5396#issuecomment-1433713252

   > The Dn case seems a corner case, it won't be very common and need to be 
careful around not getting pass a split-brain scenario. There are bunch of 
checks around though, but they are just to verify we don't get a false active 
claim acknowledged..
   
   Oh yes, this was my first focus, I tried adding bunch of logs internally 
just to ensure if we are seeing some bugs here, but so far things look good. 
It's the TCP connection that Envoy infra is messing up (only sometimes, not 
often). But nvm, this is still worth spending time on.
   
   Thanks for all good points, I am going to close this PR mostly soon (just 
figuring out a few more details) so that it doesn't pile up in the open PRs. 
   
   Thanks a lot for spending a lot of your time here, it's just so priceless!




> Optionally shut down datanode if it does not stay connected to active namenode
> --
>
> Key: HDFS-16918
> URL: https://issues.apache.org/jira/browse/HDFS-16918
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> While deploying Hdfs on Envoy proxy setup, depending on the socket timeout 
> configured at envoy, the network connection issues or packet loss could be 
> observed. All of envoys basically form a transparent communication mesh in 
> which each app can send and receive packets to and from localhost and is 
> unaware of the network topology.
> The primary purpose of Envoy is to make the network transparent to 
> applications, in order to identify network issues reliably. However, 
> sometimes such proxy based setup could result into socket connection issues 
> b/ datanode and namenode.
> Many deployment frameworks provide auto-start functionality when any of the 
> hadoop daemons are stopped. If a given datanode does not stay connected to 
> active namenode in the cluster i.e. does not receive heartbeat response in 
> time from active namenode (even though active namenode is not terminated), it 
> would not be much useful. We should be able to provide configurable behavior 
> such that if a given datanode cannot receive heartbeat response from active 
> namenode in configurable time duration, it should terminate itself to avoid 
> impacting the availability SLA. This is specifically helpful when the 
> underlying deployment or observability framework (e.g. K8S) can start up the 
> datanode automatically upon it's shutdown (unless it is being restarted as 
> part of rolling upgrade) and help the newly brought up datanode (in case of 
> k8s, a new pod with dynamically changing nodes) establish new socket 
> connection to active and standby namenodes. This should be an opt-in behavior 
> and not default one.
>  
> In a distributed system, it is essential to have robust fail-fast mechanisms 
> in place to prevent issues related to network partitioning. The system must 
> be designed to prevent further degradation of availability and consistency in 
> the event of a network partition. Several distributed systems offer fail-safe 
> approaches, and for some, partition tolerance is critical to the extent that 
> even a few seconds of heartbeat loss can trigger the removal of an 
> application server instance from the cluster. For instance, a majority of 
> zooKeeper clients utilize the ephemeral nodes for this purpose to make system 
> reliable, fault-tolerant and strongly consistent in the event of network 
> partition.
> From the hdfs architecture viewpoint, it is crucial to understand the 
> critical role that active and observer namenode play in file system 
> operations. In a large-scale cluster, if the datanodes holding the same block 
> (primary and replicas) lose connection to both active and observer namenodes 
> for a significant amount of time, delaying the process of shutting down such 
> datanodes and restarting it to re-establish the connection with the namenodes 
> (assuming the active namenode is alive, assumption is important in the even 
> of network partition to reestablish the connection) will further deteriorate 
> the availability of the service. This scenario underscores the importance of 
> resolving network partitioning.
> This is a real use case for hdfs and it is not prudent to assume that every 
> deployment or cluster management application must be able to restart 
> datanodes based on JMX metrics, as this would introduce another application 
> to resolve the network part

[jira] [Commented] (HDFS-16918) Optionally shut down datanode if it does not stay connected to active namenode

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689989#comment-17689989
 ] 

ASF GitHub Bot commented on HDFS-16918:
---

ayushtkn commented on PR #5396:
URL: https://github.com/apache/hadoop/pull/5396#issuecomment-1433706087

   I think I have lost the flow now 😅
   
   But I think using the getDataNodeStats is  a cool thing to explore, it is 
under a read lock so not costly either, and would be easier to process also may 
be...
   
   "Usually" around metrics, if things can be derived using the exposed ones, 
we don't coin new ones, generally, there are tools which can do that logics and 
show you fancy graphs and all also combining metrics together and doing maths 
on them as well...
   
   The Dn case seems a corner case, it won't be very common and need to be 
careful around not getting pass a split-brain scenario. There are bunch of 
checks around though, but they are just to verify we don't get a false active 
claim acknowledged..
   
   But just thinking about this case,  it can be figured out by simple logic or 
scripts, if there are two claiming active, the one from which the last response 
time is less can be used for those decisions. 
   
   Something like
   ```
   Variables to Store: activeNnId and LastActiveResponseTime=MAX
   Fetch Metrics From DN
   Iterate over all Namenodes. 
   Check if Active
   NnLastResponseTime < LastActiveResponseTime
   Store the nnId and last Response Time
   else 
   Move Forward
   
   if LastActiveResponseTime < configurable value
   conclude dead and 
   do
   
   ```
   
   May be some if else or equality might have got inverted, just for idea 
sake...




> Optionally shut down datanode if it does not stay connected to active namenode
> --
>
> Key: HDFS-16918
> URL: https://issues.apache.org/jira/browse/HDFS-16918
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> While deploying Hdfs on Envoy proxy setup, depending on the socket timeout 
> configured at envoy, the network connection issues or packet loss could be 
> observed. All of envoys basically form a transparent communication mesh in 
> which each app can send and receive packets to and from localhost and is 
> unaware of the network topology.
> The primary purpose of Envoy is to make the network transparent to 
> applications, in order to identify network issues reliably. However, 
> sometimes such proxy based setup could result into socket connection issues 
> b/ datanode and namenode.
> Many deployment frameworks provide auto-start functionality when any of the 
> hadoop daemons are stopped. If a given datanode does not stay connected to 
> active namenode in the cluster i.e. does not receive heartbeat response in 
> time from active namenode (even though active namenode is not terminated), it 
> would not be much useful. We should be able to provide configurable behavior 
> such that if a given datanode cannot receive heartbeat response from active 
> namenode in configurable time duration, it should terminate itself to avoid 
> impacting the availability SLA. This is specifically helpful when the 
> underlying deployment or observability framework (e.g. K8S) can start up the 
> datanode automatically upon it's shutdown (unless it is being restarted as 
> part of rolling upgrade) and help the newly brought up datanode (in case of 
> k8s, a new pod with dynamically changing nodes) establish new socket 
> connection to active and standby namenodes. This should be an opt-in behavior 
> and not default one.
>  
> In a distributed system, it is essential to have robust fail-fast mechanisms 
> in place to prevent issues related to network partitioning. The system must 
> be designed to prevent further degradation of availability and consistency in 
> the event of a network partition. Several distributed systems offer fail-safe 
> approaches, and for some, partition tolerance is critical to the extent that 
> even a few seconds of heartbeat loss can trigger the removal of an 
> application server instance from the cluster. For instance, a majority of 
> zooKeeper clients utilize the ephemeral nodes for this purpose to make system 
> reliable, fault-tolerant and strongly consistent in the event of network 
> partition.
> From the hdfs architecture viewpoint, it is crucial to understand the 
> critical role that active and observer namenode play in file system 
> operations. In a large-scale cluster, if the datanodes holding the same block 
> (primary and replicas) lose connection to both active and observer namenodes 
> for a significant amount of time, delaying the process of shutting down such 
>

[jira] [Commented] (HDFS-16918) Optionally shut down datanode if it does not stay connected to active namenode

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689973#comment-17689973
 ] 

ASF GitHub Bot commented on HDFS-16918:
---

virajjasani commented on PR #5396:
URL: https://github.com/apache/hadoop/pull/5396#issuecomment-1433668204

   Any BP service actor with "Namenode HA state" as "Active" and "Last 
Heartbeat Response" > 60s (configurable), should be treated as "State Active 
Namenode".
   Maybe we can do that. Alright, sorry for adding up more and more comments, 
let me find the best way to expose things.
   For cloud native infra, it's still not easy to let metrics be exposed to the 
pod where we want to but will have to go for some security approvals, will work 
on this in parallel.
   
   Let me try fixing or at least normalizing the Namenode states in such a 
manner that we can expose "Stale Active Namenode" kind of Namenode HA state in 
metrics. That would be fairly easy for client to consume.
   It should also not be backward incompatible given that HDFS-16902 has been a 
very recent change. So making changes now in it before it can make it to a 
release should be fine I guess.




> Optionally shut down datanode if it does not stay connected to active namenode
> --
>
> Key: HDFS-16918
> URL: https://issues.apache.org/jira/browse/HDFS-16918
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> While deploying Hdfs on Envoy proxy setup, depending on the socket timeout 
> configured at envoy, the network connection issues or packet loss could be 
> observed. All of envoys basically form a transparent communication mesh in 
> which each app can send and receive packets to and from localhost and is 
> unaware of the network topology.
> The primary purpose of Envoy is to make the network transparent to 
> applications, in order to identify network issues reliably. However, 
> sometimes such proxy based setup could result into socket connection issues 
> b/ datanode and namenode.
> Many deployment frameworks provide auto-start functionality when any of the 
> hadoop daemons are stopped. If a given datanode does not stay connected to 
> active namenode in the cluster i.e. does not receive heartbeat response in 
> time from active namenode (even though active namenode is not terminated), it 
> would not be much useful. We should be able to provide configurable behavior 
> such that if a given datanode cannot receive heartbeat response from active 
> namenode in configurable time duration, it should terminate itself to avoid 
> impacting the availability SLA. This is specifically helpful when the 
> underlying deployment or observability framework (e.g. K8S) can start up the 
> datanode automatically upon it's shutdown (unless it is being restarted as 
> part of rolling upgrade) and help the newly brought up datanode (in case of 
> k8s, a new pod with dynamically changing nodes) establish new socket 
> connection to active and standby namenodes. This should be an opt-in behavior 
> and not default one.
>  
> In a distributed system, it is essential to have robust fail-fast mechanisms 
> in place to prevent issues related to network partitioning. The system must 
> be designed to prevent further degradation of availability and consistency in 
> the event of a network partition. Several distributed systems offer fail-safe 
> approaches, and for some, partition tolerance is critical to the extent that 
> even a few seconds of heartbeat loss can trigger the removal of an 
> application server instance from the cluster. For instance, a majority of 
> zooKeeper clients utilize the ephemeral nodes for this purpose to make system 
> reliable, fault-tolerant and strongly consistent in the event of network 
> partition.
> From the hdfs architecture viewpoint, it is crucial to understand the 
> critical role that active and observer namenode play in file system 
> operations. In a large-scale cluster, if the datanodes holding the same block 
> (primary and replicas) lose connection to both active and observer namenodes 
> for a significant amount of time, delaying the process of shutting down such 
> datanodes and restarting it to re-establish the connection with the namenodes 
> (assuming the active namenode is alive, assumption is important in the even 
> of network partition to reestablish the connection) will further deteriorate 
> the availability of the service. This scenario underscores the importance of 
> resolving network partitioning.
> This is a real use case for hdfs and it is not prudent to assume that every 
> deployment or cluster management application must be able to restart 
> datanodes based on J

[jira] [Commented] (HDFS-16918) Optionally shut down datanode if it does not stay connected to active namenode

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689971#comment-17689971
 ] 

ASF GitHub Bot commented on HDFS-16918:
---

virajjasani commented on PR #5396:
URL: https://github.com/apache/hadoop/pull/5396#issuecomment-1433662089

   In the second case where dn is not connected to active nn, the BP offer 
service would still list active nn as nn-1. The only way for us to actually let 
a client (administrative applications in this case) know that the given dn is 
actually out of luck connecting to active nn is by exposing new metric which 
does internal check of looping through BP service actor metrics and making sure 
that all BPs have exactly one nn listed as "Active" and has 
lastHeartbeatReponseTime within few seconds.
   
   This is the logic we somehow needs to expose for the clients (admins to take 
actions, for k8s, it will be some scripting that checks health of dn pods 
periodically).




> Optionally shut down datanode if it does not stay connected to active namenode
> --
>
> Key: HDFS-16918
> URL: https://issues.apache.org/jira/browse/HDFS-16918
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> While deploying Hdfs on Envoy proxy setup, depending on the socket timeout 
> configured at envoy, the network connection issues or packet loss could be 
> observed. All of envoys basically form a transparent communication mesh in 
> which each app can send and receive packets to and from localhost and is 
> unaware of the network topology.
> The primary purpose of Envoy is to make the network transparent to 
> applications, in order to identify network issues reliably. However, 
> sometimes such proxy based setup could result into socket connection issues 
> b/ datanode and namenode.
> Many deployment frameworks provide auto-start functionality when any of the 
> hadoop daemons are stopped. If a given datanode does not stay connected to 
> active namenode in the cluster i.e. does not receive heartbeat response in 
> time from active namenode (even though active namenode is not terminated), it 
> would not be much useful. We should be able to provide configurable behavior 
> such that if a given datanode cannot receive heartbeat response from active 
> namenode in configurable time duration, it should terminate itself to avoid 
> impacting the availability SLA. This is specifically helpful when the 
> underlying deployment or observability framework (e.g. K8S) can start up the 
> datanode automatically upon it's shutdown (unless it is being restarted as 
> part of rolling upgrade) and help the newly brought up datanode (in case of 
> k8s, a new pod with dynamically changing nodes) establish new socket 
> connection to active and standby namenodes. This should be an opt-in behavior 
> and not default one.
>  
> In a distributed system, it is essential to have robust fail-fast mechanisms 
> in place to prevent issues related to network partitioning. The system must 
> be designed to prevent further degradation of availability and consistency in 
> the event of a network partition. Several distributed systems offer fail-safe 
> approaches, and for some, partition tolerance is critical to the extent that 
> even a few seconds of heartbeat loss can trigger the removal of an 
> application server instance from the cluster. For instance, a majority of 
> zooKeeper clients utilize the ephemeral nodes for this purpose to make system 
> reliable, fault-tolerant and strongly consistent in the event of network 
> partition.
> From the hdfs architecture viewpoint, it is crucial to understand the 
> critical role that active and observer namenode play in file system 
> operations. In a large-scale cluster, if the datanodes holding the same block 
> (primary and replicas) lose connection to both active and observer namenodes 
> for a significant amount of time, delaying the process of shutting down such 
> datanodes and restarting it to re-establish the connection with the namenodes 
> (assuming the active namenode is alive, assumption is important in the even 
> of network partition to reestablish the connection) will further deteriorate 
> the availability of the service. This scenario underscores the importance of 
> resolving network partitioning.
> This is a real use case for hdfs and it is not prudent to assume that every 
> deployment or cluster management application must be able to restart 
> datanodes based on JMX metrics, as this would introduce another application 
> to resolve the network partition impact of hdfs. Besides, popular cluster 
> management applications are not typically used in all cloud-n

[jira] [Commented] (HDFS-16918) Optionally shut down datanode if it does not stay connected to active namenode

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689968#comment-17689968
 ] 

ASF GitHub Bot commented on HDFS-16918:
---

virajjasani commented on PR #5396:
URL: https://github.com/apache/hadoop/pull/5396#issuecomment-1433655322

   Btw just to give you more insights, what I am worried about is cases like 
this:
   
   In this case, dn is connected to active nn
   
   https://user-images.githubusercontent.com/34790606/219476091-989e6a73-d54d-4c34-b60f-65152cf6980c.png";>
   
   However in this case, dn is live, it's TCP connection is lost to active nn 
and it is healthy and yet not connected to active nn
   
   https://user-images.githubusercontent.com/34790606/219476263-9a039e14-9a0a-4c18-a103-fb448a61cf58.png";>
   
   




> Optionally shut down datanode if it does not stay connected to active namenode
> --
>
> Key: HDFS-16918
> URL: https://issues.apache.org/jira/browse/HDFS-16918
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> While deploying Hdfs on Envoy proxy setup, depending on the socket timeout 
> configured at envoy, the network connection issues or packet loss could be 
> observed. All of envoys basically form a transparent communication mesh in 
> which each app can send and receive packets to and from localhost and is 
> unaware of the network topology.
> The primary purpose of Envoy is to make the network transparent to 
> applications, in order to identify network issues reliably. However, 
> sometimes such proxy based setup could result into socket connection issues 
> b/ datanode and namenode.
> Many deployment frameworks provide auto-start functionality when any of the 
> hadoop daemons are stopped. If a given datanode does not stay connected to 
> active namenode in the cluster i.e. does not receive heartbeat response in 
> time from active namenode (even though active namenode is not terminated), it 
> would not be much useful. We should be able to provide configurable behavior 
> such that if a given datanode cannot receive heartbeat response from active 
> namenode in configurable time duration, it should terminate itself to avoid 
> impacting the availability SLA. This is specifically helpful when the 
> underlying deployment or observability framework (e.g. K8S) can start up the 
> datanode automatically upon it's shutdown (unless it is being restarted as 
> part of rolling upgrade) and help the newly brought up datanode (in case of 
> k8s, a new pod with dynamically changing nodes) establish new socket 
> connection to active and standby namenodes. This should be an opt-in behavior 
> and not default one.
>  
> In a distributed system, it is essential to have robust fail-fast mechanisms 
> in place to prevent issues related to network partitioning. The system must 
> be designed to prevent further degradation of availability and consistency in 
> the event of a network partition. Several distributed systems offer fail-safe 
> approaches, and for some, partition tolerance is critical to the extent that 
> even a few seconds of heartbeat loss can trigger the removal of an 
> application server instance from the cluster. For instance, a majority of 
> zooKeeper clients utilize the ephemeral nodes for this purpose to make system 
> reliable, fault-tolerant and strongly consistent in the event of network 
> partition.
> From the hdfs architecture viewpoint, it is crucial to understand the 
> critical role that active and observer namenode play in file system 
> operations. In a large-scale cluster, if the datanodes holding the same block 
> (primary and replicas) lose connection to both active and observer namenodes 
> for a significant amount of time, delaying the process of shutting down such 
> datanodes and restarting it to re-establish the connection with the namenodes 
> (assuming the active namenode is alive, assumption is important in the even 
> of network partition to reestablish the connection) will further deteriorate 
> the availability of the service. This scenario underscores the importance of 
> resolving network partitioning.
> This is a real use case for hdfs and it is not prudent to assume that every 
> deployment or cluster management application must be able to restart 
> datanodes based on JMX metrics, as this would introduce another application 
> to resolve the network partition impact of hdfs. Besides, popular cluster 
> management applications are not typically used in all cloud-native env. Even 
> if these cluster management applications are deployed, certain security 
> constraints may restrict their access to JMX metrics and prevent them from 
> in

[jira] [Commented] (HDFS-16918) Optionally shut down datanode if it does not stay connected to active namenode

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689966#comment-17689966
 ] 

ASF GitHub Bot commented on HDFS-16918:
---

virajjasani commented on PR #5396:
URL: https://github.com/apache/hadoop/pull/5396#issuecomment-1433649342

   Yes, we do have check at namenode side:
   
   ```
 @Override // NameNodeMXBean
 public String getDeadNodes() {
   final Map> info = 
 new HashMap>();
   final List dead = new 
ArrayList();
   blockManager.getDatanodeManager().fetchDatanodes(null, dead, false);
   for (DatanodeDescriptor node : dead) {
   ...
   ...
   ...
   ```
   
   I am thinking more about `getDataNodeStats()` API




> Optionally shut down datanode if it does not stay connected to active namenode
> --
>
> Key: HDFS-16918
> URL: https://issues.apache.org/jira/browse/HDFS-16918
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> While deploying Hdfs on Envoy proxy setup, depending on the socket timeout 
> configured at envoy, the network connection issues or packet loss could be 
> observed. All of envoys basically form a transparent communication mesh in 
> which each app can send and receive packets to and from localhost and is 
> unaware of the network topology.
> The primary purpose of Envoy is to make the network transparent to 
> applications, in order to identify network issues reliably. However, 
> sometimes such proxy based setup could result into socket connection issues 
> b/ datanode and namenode.
> Many deployment frameworks provide auto-start functionality when any of the 
> hadoop daemons are stopped. If a given datanode does not stay connected to 
> active namenode in the cluster i.e. does not receive heartbeat response in 
> time from active namenode (even though active namenode is not terminated), it 
> would not be much useful. We should be able to provide configurable behavior 
> such that if a given datanode cannot receive heartbeat response from active 
> namenode in configurable time duration, it should terminate itself to avoid 
> impacting the availability SLA. This is specifically helpful when the 
> underlying deployment or observability framework (e.g. K8S) can start up the 
> datanode automatically upon it's shutdown (unless it is being restarted as 
> part of rolling upgrade) and help the newly brought up datanode (in case of 
> k8s, a new pod with dynamically changing nodes) establish new socket 
> connection to active and standby namenodes. This should be an opt-in behavior 
> and not default one.
>  
> In a distributed system, it is essential to have robust fail-fast mechanisms 
> in place to prevent issues related to network partitioning. The system must 
> be designed to prevent further degradation of availability and consistency in 
> the event of a network partition. Several distributed systems offer fail-safe 
> approaches, and for some, partition tolerance is critical to the extent that 
> even a few seconds of heartbeat loss can trigger the removal of an 
> application server instance from the cluster. For instance, a majority of 
> zooKeeper clients utilize the ephemeral nodes for this purpose to make system 
> reliable, fault-tolerant and strongly consistent in the event of network 
> partition.
> From the hdfs architecture viewpoint, it is crucial to understand the 
> critical role that active and observer namenode play in file system 
> operations. In a large-scale cluster, if the datanodes holding the same block 
> (primary and replicas) lose connection to both active and observer namenodes 
> for a significant amount of time, delaying the process of shutting down such 
> datanodes and restarting it to re-establish the connection with the namenodes 
> (assuming the active namenode is alive, assumption is important in the even 
> of network partition to reestablish the connection) will further deteriorate 
> the availability of the service. This scenario underscores the importance of 
> resolving network partitioning.
> This is a real use case for hdfs and it is not prudent to assume that every 
> deployment or cluster management application must be able to restart 
> datanodes based on JMX metrics, as this would introduce another application 
> to resolve the network partition impact of hdfs. Besides, popular cluster 
> management applications are not typically used in all cloud-native env. Even 
> if these cluster management applications are deployed, certain security 
> constraints may restrict their access to JMX metrics and prevent them from 
> interfering with hdfs operations. The applications that can only trigger 
> aler

[jira] [Commented] (HDFS-16917) Add transfer rate quantile metrics for DataNode reads

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689959#comment-17689959
 ] 

ASF GitHub Bot commented on HDFS-16917:
---

rdingankar commented on PR #5397:
URL: https://github.com/apache/hadoop/pull/5397#issuecomment-1433632301

   The UTs failing in the build are unrelated to my change. Verified locally 
that the same tests fail after rebasing to current trunk.




> Add transfer rate quantile metrics for DataNode reads
> -
>
> Key: HDFS-16917
> URL: https://issues.apache.org/jira/browse/HDFS-16917
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Reporter: Ravindra Dingankar
>Priority: Minor
>  Labels: pull-request-available
>
> Currently we have the following metrics for datanode reads.
> |BytesRead
> BlocksRead
> TotalReadTime|Total number of bytes read from DataNode
> Total number of blocks read from DataNode
> Total number of milliseconds spent on read operation|
> We would like to add a new quantile metric calculating the distribution of 
> data transfer rate for datanode reads.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16918) Optionally shut down datanode if it does not stay connected to active namenode

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689958#comment-17689958
 ] 

ASF GitHub Bot commented on HDFS-16918:
---

ayushtkn commented on PR #5396:
URL: https://github.com/apache/hadoop/pull/5396#issuecomment-1433632089

   The active NN knows that which datanode is dead. That is how it shows in the 
UI as well. There would be some param in the JMX which must be telling the 
state of the datanode to the active namenode. I can pull that out for you, if 
you want, but it is in the UI, so there would be a metric for sure, just being 
lazy to check the code again:
   
![image](https://user-images.githubusercontent.com/25608848/219469872-4d561fdb-98b0-46d8-abb7-e5c2eb0dcd46.png)
   
   Datanode has metrics and you know post what time it is declared dead. Any 
service can have periodic health checks and have a check. Anyway you have a 
service which checks if datanode is dead and restarts, some logics here and 
there in that to have a periodic check to shoot a shutdown as well, should do.
   
   https://user-images.githubusercontent.com/25608848/219470918-db38d602-984f-4baa-9860-aee19b2af646.png";>
   
   Code point of view implementing such a logic sounds very naive to me. or may 
be minimal effort thing
   
   Not dragging the use case list either, because there ain't no end to that, 
client was X and he was in Y state and blah blah, datanode block reconstruction 
works, around block movements and it won't end




> Optionally shut down datanode if it does not stay connected to active namenode
> --
>
> Key: HDFS-16918
> URL: https://issues.apache.org/jira/browse/HDFS-16918
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> While deploying Hdfs on Envoy proxy setup, depending on the socket timeout 
> configured at envoy, the network connection issues or packet loss could be 
> observed. All of envoys basically form a transparent communication mesh in 
> which each app can send and receive packets to and from localhost and is 
> unaware of the network topology.
> The primary purpose of Envoy is to make the network transparent to 
> applications, in order to identify network issues reliably. However, 
> sometimes such proxy based setup could result into socket connection issues 
> b/ datanode and namenode.
> Many deployment frameworks provide auto-start functionality when any of the 
> hadoop daemons are stopped. If a given datanode does not stay connected to 
> active namenode in the cluster i.e. does not receive heartbeat response in 
> time from active namenode (even though active namenode is not terminated), it 
> would not be much useful. We should be able to provide configurable behavior 
> such that if a given datanode cannot receive heartbeat response from active 
> namenode in configurable time duration, it should terminate itself to avoid 
> impacting the availability SLA. This is specifically helpful when the 
> underlying deployment or observability framework (e.g. K8S) can start up the 
> datanode automatically upon it's shutdown (unless it is being restarted as 
> part of rolling upgrade) and help the newly brought up datanode (in case of 
> k8s, a new pod with dynamically changing nodes) establish new socket 
> connection to active and standby namenodes. This should be an opt-in behavior 
> and not default one.
>  
> In a distributed system, it is essential to have robust fail-fast mechanisms 
> in place to prevent issues related to network partitioning. The system must 
> be designed to prevent further degradation of availability and consistency in 
> the event of a network partition. Several distributed systems offer fail-safe 
> approaches, and for some, partition tolerance is critical to the extent that 
> even a few seconds of heartbeat loss can trigger the removal of an 
> application server instance from the cluster. For instance, a majority of 
> zooKeeper clients utilize the ephemeral nodes for this purpose to make system 
> reliable, fault-tolerant and strongly consistent in the event of network 
> partition.
> From the hdfs architecture viewpoint, it is crucial to understand the 
> critical role that active and observer namenode play in file system 
> operations. In a large-scale cluster, if the datanodes holding the same block 
> (primary and replicas) lose connection to both active and observer namenodes 
> for a significant amount of time, delaying the process of shutting down such 
> datanodes and restarting it to re-establish the connection with the namenodes 
> (assuming the active namenode is alive, assumption is important in the even 
> of network partition to rees

[jira] [Commented] (HDFS-16918) Optionally shut down datanode if it does not stay connected to active namenode

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689957#comment-17689957
 ] 

ASF GitHub Bot commented on HDFS-16918:
---

virajjasani commented on PR #5396:
URL: https://github.com/apache/hadoop/pull/5396#issuecomment-1433630564

   > or there is an API which getDatanodeStats and which can take dead as param 
or so, and such a logic can be developed by a periodic check or so.
   
   I think this is interesting point, indeed I should have considered this API 
that is already in use. Let me get back with some improvements to it and then I 
can suggest building K8S script around this, that should have similar impact as 
what we are trying to achieve within datanode itself.




> Optionally shut down datanode if it does not stay connected to active namenode
> --
>
> Key: HDFS-16918
> URL: https://issues.apache.org/jira/browse/HDFS-16918
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> While deploying Hdfs on Envoy proxy setup, depending on the socket timeout 
> configured at envoy, the network connection issues or packet loss could be 
> observed. All of envoys basically form a transparent communication mesh in 
> which each app can send and receive packets to and from localhost and is 
> unaware of the network topology.
> The primary purpose of Envoy is to make the network transparent to 
> applications, in order to identify network issues reliably. However, 
> sometimes such proxy based setup could result into socket connection issues 
> b/ datanode and namenode.
> Many deployment frameworks provide auto-start functionality when any of the 
> hadoop daemons are stopped. If a given datanode does not stay connected to 
> active namenode in the cluster i.e. does not receive heartbeat response in 
> time from active namenode (even though active namenode is not terminated), it 
> would not be much useful. We should be able to provide configurable behavior 
> such that if a given datanode cannot receive heartbeat response from active 
> namenode in configurable time duration, it should terminate itself to avoid 
> impacting the availability SLA. This is specifically helpful when the 
> underlying deployment or observability framework (e.g. K8S) can start up the 
> datanode automatically upon it's shutdown (unless it is being restarted as 
> part of rolling upgrade) and help the newly brought up datanode (in case of 
> k8s, a new pod with dynamically changing nodes) establish new socket 
> connection to active and standby namenodes. This should be an opt-in behavior 
> and not default one.
>  
> In a distributed system, it is essential to have robust fail-fast mechanisms 
> in place to prevent issues related to network partitioning. The system must 
> be designed to prevent further degradation of availability and consistency in 
> the event of a network partition. Several distributed systems offer fail-safe 
> approaches, and for some, partition tolerance is critical to the extent that 
> even a few seconds of heartbeat loss can trigger the removal of an 
> application server instance from the cluster. For instance, a majority of 
> zooKeeper clients utilize the ephemeral nodes for this purpose to make system 
> reliable, fault-tolerant and strongly consistent in the event of network 
> partition.
> From the hdfs architecture viewpoint, it is crucial to understand the 
> critical role that active and observer namenode play in file system 
> operations. In a large-scale cluster, if the datanodes holding the same block 
> (primary and replicas) lose connection to both active and observer namenodes 
> for a significant amount of time, delaying the process of shutting down such 
> datanodes and restarting it to re-establish the connection with the namenodes 
> (assuming the active namenode is alive, assumption is important in the even 
> of network partition to reestablish the connection) will further deteriorate 
> the availability of the service. This scenario underscores the importance of 
> resolving network partitioning.
> This is a real use case for hdfs and it is not prudent to assume that every 
> deployment or cluster management application must be able to restart 
> datanodes based on JMX metrics, as this would introduce another application 
> to resolve the network partition impact of hdfs. Besides, popular cluster 
> management applications are not typically used in all cloud-native env. Even 
> if these cluster management applications are deployed, certain security 
> constraints may restrict their access to JMX metrics and prevent them from 
> interfering with hdfs operations. The applications that

[jira] [Commented] (HDFS-16918) Optionally shut down datanode if it does not stay connected to active namenode

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689949#comment-17689949
 ] 

ASF GitHub Bot commented on HDFS-16918:
---

virajjasani commented on PR #5396:
URL: https://github.com/apache/hadoop/pull/5396#issuecomment-1433614873

   > I have mentioned a bunch of reasons above, I even think if some client is 
connected to datanode and happily reading a file, he might get impacted, AFAIK 
block location can be cached as well and there are many other reasons
   
   Yes this is valid point but only until block locations stay cached at client 
:)
   But I understand there is no point discussing on the same reasons as we 
would keep dragging the same point. Thanks!!




> Optionally shut down datanode if it does not stay connected to active namenode
> --
>
> Key: HDFS-16918
> URL: https://issues.apache.org/jira/browse/HDFS-16918
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> While deploying Hdfs on Envoy proxy setup, depending on the socket timeout 
> configured at envoy, the network connection issues or packet loss could be 
> observed. All of envoys basically form a transparent communication mesh in 
> which each app can send and receive packets to and from localhost and is 
> unaware of the network topology.
> The primary purpose of Envoy is to make the network transparent to 
> applications, in order to identify network issues reliably. However, 
> sometimes such proxy based setup could result into socket connection issues 
> b/ datanode and namenode.
> Many deployment frameworks provide auto-start functionality when any of the 
> hadoop daemons are stopped. If a given datanode does not stay connected to 
> active namenode in the cluster i.e. does not receive heartbeat response in 
> time from active namenode (even though active namenode is not terminated), it 
> would not be much useful. We should be able to provide configurable behavior 
> such that if a given datanode cannot receive heartbeat response from active 
> namenode in configurable time duration, it should terminate itself to avoid 
> impacting the availability SLA. This is specifically helpful when the 
> underlying deployment or observability framework (e.g. K8S) can start up the 
> datanode automatically upon it's shutdown (unless it is being restarted as 
> part of rolling upgrade) and help the newly brought up datanode (in case of 
> k8s, a new pod with dynamically changing nodes) establish new socket 
> connection to active and standby namenodes. This should be an opt-in behavior 
> and not default one.
>  
> In a distributed system, it is essential to have robust fail-fast mechanisms 
> in place to prevent issues related to network partitioning. The system must 
> be designed to prevent further degradation of availability and consistency in 
> the event of a network partition. Several distributed systems offer fail-safe 
> approaches, and for some, partition tolerance is critical to the extent that 
> even a few seconds of heartbeat loss can trigger the removal of an 
> application server instance from the cluster. For instance, a majority of 
> zooKeeper clients utilize the ephemeral nodes for this purpose to make system 
> reliable, fault-tolerant and strongly consistent in the event of network 
> partition.
> From the hdfs architecture viewpoint, it is crucial to understand the 
> critical role that active and observer namenode play in file system 
> operations. In a large-scale cluster, if the datanodes holding the same block 
> (primary and replicas) lose connection to both active and observer namenodes 
> for a significant amount of time, delaying the process of shutting down such 
> datanodes and restarting it to re-establish the connection with the namenodes 
> (assuming the active namenode is alive, assumption is important in the even 
> of network partition to reestablish the connection) will further deteriorate 
> the availability of the service. This scenario underscores the importance of 
> resolving network partitioning.
> This is a real use case for hdfs and it is not prudent to assume that every 
> deployment or cluster management application must be able to restart 
> datanodes based on JMX metrics, as this would introduce another application 
> to resolve the network partition impact of hdfs. Besides, popular cluster 
> management applications are not typically used in all cloud-native env. Even 
> if these cluster management applications are deployed, certain security 
> constraints may restrict their access to JMX metrics and prevent them from 
> interfering with hdfs operations. The applications that can only

[jira] [Commented] (HDFS-16918) Optionally shut down datanode if it does not stay connected to active namenode

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689948#comment-17689948
 ] 

ASF GitHub Bot commented on HDFS-16918:
---

virajjasani commented on PR #5396:
URL: https://github.com/apache/hadoop/pull/5396#issuecomment-1433612218

   Thanks for the reply Ayush, appreciate it as always.
   Are you saying that default implementation of "logging and/or exposing JMX 
metrics" for a given datanode if it doesn't stay connected is also not feasible 
according to you? I know we have metric that says "lastHeartbeat" and 
"lastHeartbeatResponseTime" but it's still difficult for user or script to 
apply a loop into BP service actor metrics rather than getting as simple log or 
metric as "this datanode has not heard from active namenode in the last 60s or 
so". Are you at least fine with keeping this as default implementation logic?
   




> Optionally shut down datanode if it does not stay connected to active namenode
> --
>
> Key: HDFS-16918
> URL: https://issues.apache.org/jira/browse/HDFS-16918
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> While deploying Hdfs on Envoy proxy setup, depending on the socket timeout 
> configured at envoy, the network connection issues or packet loss could be 
> observed. All of envoys basically form a transparent communication mesh in 
> which each app can send and receive packets to and from localhost and is 
> unaware of the network topology.
> The primary purpose of Envoy is to make the network transparent to 
> applications, in order to identify network issues reliably. However, 
> sometimes such proxy based setup could result into socket connection issues 
> b/ datanode and namenode.
> Many deployment frameworks provide auto-start functionality when any of the 
> hadoop daemons are stopped. If a given datanode does not stay connected to 
> active namenode in the cluster i.e. does not receive heartbeat response in 
> time from active namenode (even though active namenode is not terminated), it 
> would not be much useful. We should be able to provide configurable behavior 
> such that if a given datanode cannot receive heartbeat response from active 
> namenode in configurable time duration, it should terminate itself to avoid 
> impacting the availability SLA. This is specifically helpful when the 
> underlying deployment or observability framework (e.g. K8S) can start up the 
> datanode automatically upon it's shutdown (unless it is being restarted as 
> part of rolling upgrade) and help the newly brought up datanode (in case of 
> k8s, a new pod with dynamically changing nodes) establish new socket 
> connection to active and standby namenodes. This should be an opt-in behavior 
> and not default one.
>  
> In a distributed system, it is essential to have robust fail-fast mechanisms 
> in place to prevent issues related to network partitioning. The system must 
> be designed to prevent further degradation of availability and consistency in 
> the event of a network partition. Several distributed systems offer fail-safe 
> approaches, and for some, partition tolerance is critical to the extent that 
> even a few seconds of heartbeat loss can trigger the removal of an 
> application server instance from the cluster. For instance, a majority of 
> zooKeeper clients utilize the ephemeral nodes for this purpose to make system 
> reliable, fault-tolerant and strongly consistent in the event of network 
> partition.
> From the hdfs architecture viewpoint, it is crucial to understand the 
> critical role that active and observer namenode play in file system 
> operations. In a large-scale cluster, if the datanodes holding the same block 
> (primary and replicas) lose connection to both active and observer namenodes 
> for a significant amount of time, delaying the process of shutting down such 
> datanodes and restarting it to re-establish the connection with the namenodes 
> (assuming the active namenode is alive, assumption is important in the even 
> of network partition to reestablish the connection) will further deteriorate 
> the availability of the service. This scenario underscores the importance of 
> resolving network partitioning.
> This is a real use case for hdfs and it is not prudent to assume that every 
> deployment or cluster management application must be able to restart 
> datanodes based on JMX metrics, as this would introduce another application 
> to resolve the network partition impact of hdfs. Besides, popular cluster 
> management applications are not typically used in all cloud-native env. Even 
> if these cluster management applications are deploy

[jira] [Commented] (HDFS-16917) Add transfer rate quantile metrics for DataNode reads

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689947#comment-17689947
 ] 

ASF GitHub Bot commented on HDFS-16917:
---

rdingankar commented on code in PR #5397:
URL: https://github.com/apache/hadoop/pull/5397#discussion_r1108939915


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java:
##
@@ -1936,4 +1936,17 @@ public static boolean isParentEntry(final String path, 
final String parent) {
 return path.charAt(parent.length()) == Path.SEPARATOR_CHAR
 || parent.equals(Path.SEPARATOR);
   }
+
+  /**
+   * Calculate the transfer rate in megabytes/second.
+   * @param bytes bytes
+   * @param durationMS duration in milliseconds
+   * @return the number of megabytes/second of the transfer rate
+  */
+  public static long transferRateMBs(long bytes, long durationMS) {
+if (durationMS == 0) {

Review Comment:
   updated





> Add transfer rate quantile metrics for DataNode reads
> -
>
> Key: HDFS-16917
> URL: https://issues.apache.org/jira/browse/HDFS-16917
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Reporter: Ravindra Dingankar
>Priority: Minor
>  Labels: pull-request-available
>
> Currently we have the following metrics for datanode reads.
> |BytesRead
> BlocksRead
> TotalReadTime|Total number of bytes read from DataNode
> Total number of blocks read from DataNode
> Total number of milliseconds spent on read operation|
> We would like to add a new quantile metric calculating the distribution of 
> data transfer rate for datanode reads.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16918) Optionally shut down datanode if it does not stay connected to active namenode

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689944#comment-17689944
 ] 

ASF GitHub Bot commented on HDFS-16918:
---

ayushtkn commented on PR #5396:
URL: https://github.com/apache/hadoop/pull/5396#issuecomment-1433606299

   Viraj I am Sorry I am totally against this. I am writing this because I 
don't want to ghost you and then if someone comes and agrees then shoot a vote 
against.
   
   I have mentioned a bunch of reasons above, I even think if some client is 
connected to datanode and happily reading a file, he might get impacted, AFAIK 
block location can be cached as well and there are many other reasons, I don't 
to get you a list, I am pretty sure you would be aware of almost all of them...
   
   A service like datanode killing itself doesn't sound something feasible to 
me at all. Having these hooks and all in a service which holds data, sounds 
just doing the same thing but opening ways to get exploited. That sound even 
more risky to me.
   
   This is something a cluster Admin services should handle. A datanode going 
down or having troubles is something a basic use case for HDFS, that is where 
replication pitches in.
   
   Ideally it should just alarm the admins and they should figure out what went 
wrong, may be a restart won't fix things and you would be in loop, doing a 
shutdown shoot BR to the ones your are still connected and then restart. 
   
   Metrics are there which can tell you which datanode is dead, so advanced 
cluster administrator services can leverage that. There is 
[JMXJsonServlet](https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/jmx/JMXJsonServlet.java)
 which can be leveraged. If those services can trigger a restart and do 
operations like shutdown, they should be allowed to fetch metrics as well. or 
there is an API which getDatanodeStats and which can take dead as param or so, 
and such a logic can be developed by a periodic check or so.
   
   Regarding the cloud thing and metrics stuff. I got a chance to talk to some 
cloud Infra folks at my org and we do have ways to get metrics. I am not 
sharing how, because I don't know how professionally safe it is for me. But 
there are ways to do so. 
   
   So, This can be handled at deployment levels. Should be done there only and 
this auto shutdown logic based on some factors, I am just repeating myself I am 
totally against it




> Optionally shut down datanode if it does not stay connected to active namenode
> --
>
> Key: HDFS-16918
> URL: https://issues.apache.org/jira/browse/HDFS-16918
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> While deploying Hdfs on Envoy proxy setup, depending on the socket timeout 
> configured at envoy, the network connection issues or packet loss could be 
> observed. All of envoys basically form a transparent communication mesh in 
> which each app can send and receive packets to and from localhost and is 
> unaware of the network topology.
> The primary purpose of Envoy is to make the network transparent to 
> applications, in order to identify network issues reliably. However, 
> sometimes such proxy based setup could result into socket connection issues 
> b/ datanode and namenode.
> Many deployment frameworks provide auto-start functionality when any of the 
> hadoop daemons are stopped. If a given datanode does not stay connected to 
> active namenode in the cluster i.e. does not receive heartbeat response in 
> time from active namenode (even though active namenode is not terminated), it 
> would not be much useful. We should be able to provide configurable behavior 
> such that if a given datanode cannot receive heartbeat response from active 
> namenode in configurable time duration, it should terminate itself to avoid 
> impacting the availability SLA. This is specifically helpful when the 
> underlying deployment or observability framework (e.g. K8S) can start up the 
> datanode automatically upon it's shutdown (unless it is being restarted as 
> part of rolling upgrade) and help the newly brought up datanode (in case of 
> k8s, a new pod with dynamically changing nodes) establish new socket 
> connection to active and standby namenodes. This should be an opt-in behavior 
> and not default one.
>  
> In a distributed system, it is essential to have robust fail-fast mechanisms 
> in place to prevent issues related to network partitioning. The system must 
> be designed to prevent further degradation of availability and consistency in 
> the event of a network partition. Several distributed systems o

[jira] [Commented] (HDFS-16918) Optionally shut down datanode if it does not stay connected to active namenode

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689900#comment-17689900
 ] 

ASF GitHub Bot commented on HDFS-16918:
---

virajjasani commented on PR #5396:
URL: https://github.com/apache/hadoop/pull/5396#issuecomment-1433513544

   How about this? This change is working fine on the cluster as is and it is a 
real requirement as I explained in the [above 
comment](https://github.com/apache/hadoop/pull/5396#issuecomment-1432160563).
   If we do not want to keep this change as is i.e. shutdown datanode if not 
connected to active namenode, how about we provide a pluggable implementation?
   Let's say, by default, if the datanode does not stay connected to active 
namenode for 60s, in the default implementation (that we can provide with this 
patch) we take action of just logging (or maybe expose metric, whatever 
reviewers feel feasible) the fact that this datanode is not being useful for 
client as it has lost connected to active namenode for more than the past 60s. 
This is the default implementation that we can keep. On the other hand, users 
are allowed to have their own pluggable implementation so let's say if someone 
wants to shutdown datanode after 60s (default) of loosing connection, they will 
have to use new implementation with action as "shutdown datanode".
   
   Hence, we have two configs for this change:
   1. time duration for loosing connection 
(`dfs.datanode.health.activennconnect.timeout`) which we already have, but with 
default value as 60s
   2. action to be performed by datanode when above threshold is reached (maybe 
something like `dfs.datanode.activennconnect.timeout.action.impl`) with default 
implementation that would take action of just logging or exposing metric as per 
consensus.
   
   Any user can have their own implementation separately maintained and that 
implementation can take action of shutting down datanode, or running another 
script that could invoke dfsadmin action. Anything should be fine but now the 
code stays with users.
   Thoughts?




> Optionally shut down datanode if it does not stay connected to active namenode
> --
>
> Key: HDFS-16918
> URL: https://issues.apache.org/jira/browse/HDFS-16918
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> While deploying Hdfs on Envoy proxy setup, depending on the socket timeout 
> configured at envoy, the network connection issues or packet loss could be 
> observed. All of envoys basically form a transparent communication mesh in 
> which each app can send and receive packets to and from localhost and is 
> unaware of the network topology.
> The primary purpose of Envoy is to make the network transparent to 
> applications, in order to identify network issues reliably. However, 
> sometimes such proxy based setup could result into socket connection issues 
> b/ datanode and namenode.
> Many deployment frameworks provide auto-start functionality when any of the 
> hadoop daemons are stopped. If a given datanode does not stay connected to 
> active namenode in the cluster i.e. does not receive heartbeat response in 
> time from active namenode (even though active namenode is not terminated), it 
> would not be much useful. We should be able to provide configurable behavior 
> such that if a given datanode cannot receive heartbeat response from active 
> namenode in configurable time duration, it should terminate itself to avoid 
> impacting the availability SLA. This is specifically helpful when the 
> underlying deployment or observability framework (e.g. K8S) can start up the 
> datanode automatically upon it's shutdown (unless it is being restarted as 
> part of rolling upgrade) and help the newly brought up datanode (in case of 
> k8s, a new pod with dynamically changing nodes) establish new socket 
> connection to active and standby namenodes. This should be an opt-in behavior 
> and not default one.
>  
> In a distributed system, it is essential to have robust fail-fast mechanisms 
> in place to prevent issues related to network partitioning. The system must 
> be designed to prevent further degradation of availability and consistency in 
> the event of a network partition. Several distributed systems offer fail-safe 
> approaches, and for some, partition tolerance is critical to the extent that 
> even a few seconds of heartbeat loss can trigger the removal of an 
> application server instance from the cluster. For instance, a majority of 
> zooKeeper clients utilize the ephemeral nodes for this purpose to make system 
> reliable, fault-tolerant and strongly consistent in the event of network 
> partition.
>

[jira] [Commented] (HDFS-16761) Namenode UI for Datanodes page not loading if any data node is down

2023-02-16 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689891#comment-17689891
 ] 

Stephen O'Donnell commented on HDFS-16761:
--

Branch 3.2 seems to be OK too, so resolving this one.

> Namenode UI for Datanodes page not loading if any data node is down
> ---
>
> Key: HDFS-16761
> URL: https://issues.apache.org/jira/browse/HDFS-16761
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.2
>Reporter: Krishna Reddy
>Assignee: Zita Dombi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Steps to reproduce:
> - Install the hadoop components and add 3 datanodes
> - Enable namenode HA 
> - Open Namenode UI and check datanode page 
> - check all datanodes will display
> - Now make one datanode down
> - wait for 10 minutes time as heartbeat expires
> - Refresh namenode page and check
>  
> Actual Result: It is showing error message "NameNode is still loading. 
> Redirecting to the Startup Progress page."



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16761) Namenode UI for Datanodes page not loading if any data node is down

2023-02-16 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16761:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Namenode UI for Datanodes page not loading if any data node is down
> ---
>
> Key: HDFS-16761
> URL: https://issues.apache.org/jira/browse/HDFS-16761
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.2
>Reporter: Krishna Reddy
>Assignee: Zita Dombi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Steps to reproduce:
> - Install the hadoop components and add 3 datanodes
> - Enable namenode HA 
> - Open Namenode UI and check datanode page 
> - check all datanodes will display
> - Now make one datanode down
> - wait for 10 minutes time as heartbeat expires
> - Refresh namenode page and check
>  
> Actual Result: It is showing error message "NameNode is still loading. 
> Redirecting to the Startup Progress page."



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16917) Add transfer rate quantile metrics for DataNode reads

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689786#comment-17689786
 ] 

ASF GitHub Bot commented on HDFS-16917:
---

hadoop-yetus commented on PR #5397:
URL: https://github.com/apache/hadoop/pull/5397#issuecomment-1433166779

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 47s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m 35s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  30m 57s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  23m  9s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  compile  |  20m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   3m 49s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 28s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 30s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   2m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   6m  8s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  26m 12s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 29s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 34s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 34s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javac  |  22m 34s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |  20m 32s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   3m 41s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5397/8/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 2 new + 139 unchanged - 0 fixed = 141 total (was 
139)  |
   | +1 :green_heart: |  mvnsite  |   3m 35s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m 20s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   2m 42s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   6m 25s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m 54s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  18m 21s |  |  hadoop-common in the patch 
passed.  |
   | -1 :x: |  unit  | 210m 41s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5397/8/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 15s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 457m 38s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.TestAuditLogger |
   |   | hadoop.hdfs.server.namenode.TestFsck |
   |   | hadoop.hdfs.server.namenode.TestAuditLogs |
   |   | hadoop.hdfs.server.namenode.ha.TestObserverNode |
   |   | hadoop.hdfs.server.namenode.TestFSNamesystemLockReport |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.42 ServerAPI=1.42 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5397/8/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5397 |
   | Optional Tests | dupname asflicense mvnsite codespell detsecrets 
markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs 
checkstyle |
   | uname | Linux 4a7bede6aee2 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 
1

[jira] [Commented] (HDFS-16922) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689725#comment-17689725
 ] 

ASF GitHub Bot commented on HDFS-16922:
---

hfutatzhanghb commented on PR #5398:
URL: https://github.com/apache/hadoop/pull/5398#issuecomment-1433001412

   > 
   
   hi, @zhangshuyan0 . thanks for your replying~. yes, it is still possible to 
miss blocks in trunk when the replace policy is set to NEVER. I'am developing a 
UT to reproduce this case. 




> The logic of IncrementalBlockReportManager#addRDBI method may cause missing 
> blocks when cluster is busy.
> 
>
> Key: HDFS-16922
> URL: https://issues.apache.org/jira/browse/HDFS-16922
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: ZhangHB
>Priority: Major
>  Labels: pull-request-available
>
> The current logic of IncrementalBlockReportManager# addRDBI method could lead 
> to the missing blocks when datanodes in pipeline are I/O busy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16922) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689720#comment-17689720
 ] 

ASF GitHub Bot commented on HDFS-16922:
---

zhangshuyan0 commented on PR #5398:
URL: https://github.com/apache/hadoop/pull/5398#issuecomment-1432985896

   It's great to make sure dn only report the replica with maximum timestamp.
   Even though [HDFS-16146](https://issues.apache.org/jira/browse/HDFS-16146) 
already merged, is it still possible to miss blocks in trunk when the replace 
policy is set to NEVER ? Would you mind adding a UT for reproducing this case?




> The logic of IncrementalBlockReportManager#addRDBI method may cause missing 
> blocks when cluster is busy.
> 
>
> Key: HDFS-16922
> URL: https://issues.apache.org/jira/browse/HDFS-16922
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: ZhangHB
>Priority: Major
>  Labels: pull-request-available
>
> The current logic of IncrementalBlockReportManager# addRDBI method could lead 
> to the missing blocks when datanodes in pipeline are I/O busy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16922) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689715#comment-17689715
 ] 

ASF GitHub Bot commented on HDFS-16922:
---

zhangshuyan0 commented on PR #5398:
URL: https://github.com/apache/hadoop/pull/5398#issuecomment-1432976667

   It's great to make sure dn only report the replica with maximum timestamp. 
   Even though [HDFS-16146](https://issues.apache.org/jira/browse/HDFS-16146) 
already merged, is it still possible to miss blocks in trunk when the 
replication factor is 2 ?  Would you mind adding a UT for reproducing this case?




> The logic of IncrementalBlockReportManager#addRDBI method may cause missing 
> blocks when cluster is busy.
> 
>
> Key: HDFS-16922
> URL: https://issues.apache.org/jira/browse/HDFS-16922
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: ZhangHB
>Priority: Major
>  Labels: pull-request-available
>
> The current logic of IncrementalBlockReportManager# addRDBI method could lead 
> to the missing blocks when datanodes in pipeline are I/O busy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16898) Remove write lock for processCommandFromActor of DataNode to reduce impact on heartbeat

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689656#comment-17689656
 ] 

ASF GitHub Bot commented on HDFS-16898:
---

Hexiaoqiao commented on PR #5408:
URL: https://github.com/apache/hadoop/pull/5408#issuecomment-1432809951

   @hfutatzhanghb Please check failed unit tests if relate with this changes. 
Thanks.




> Remove write lock for processCommandFromActor of DataNode to reduce impact on 
> heartbeat
> ---
>
> Key: HDFS-16898
> URL: https://issues.apache.org/jira/browse/HDFS-16898
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Assignee: ZhangHB
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Now in method processCommandFromActor,  we have code like below:
>  
> {code:java}
> writeLock();
> try {
>   if (actor == bpServiceToActive) {
> return processCommandFromActive(cmd, actor);
>   } else {
> return processCommandFromStandby(cmd, actor);
>   }
> } finally {
>   writeUnlock();
> } {code}
> if method processCommandFromActive costs much time, the write lock would not 
> release.
>  
> It maybe block the updateActorStatesFromHeartbeat method in 
> offerService，furthermore, it can cause the lastcontact of datanode very high, 
> even dead when lastcontact beyond 600s.
> {code:java}
> bpos.updateActorStatesFromHeartbeat(
> this, resp.getNameNodeHaState());{code}
> here we can make write lock fine-grain in processCommandFromActor method to 
> address this problem
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16922) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689654#comment-17689654
 ] 

ASF GitHub Bot commented on HDFS-16922:
---

Hexiaoqiao commented on code in PR #5398:
URL: https://github.com/apache/hadoop/pull/5398#discussion_r1108243429


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/IncrementalBlockReportManager.java:
##
@@ -252,7 +256,9 @@ synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi,
 // Make sure another entry for the same block is first removed.
 // There may only be one such entry.
 for (PerStorageIBR perStorage : pendingIBRs.values()) {
-  if (perStorage.remove(rdbi.getBlock()) != null) {
+  ReceivedDeletedBlockInfo oldRdbi = perStorage.get(rdbi.getBlock());
+  if (oldRdbi != null && oldRdbi.getBlock().getGenerationStamp() < 
rdbi.getBlock().getGenerationStamp()

Review Comment:
   This fix still leave one unexpected case, consider the new entry's 
generation stamp is less than the old one, it will put again at line 265 and 
overwrite it, right?
   How about the following patch? ADD it will be better to add new unit test to 
verify this bugfix. Thanks.
   ```
   -

> The logic of IncrementalBlockReportManager#addRDBI method may cause missing 
> blocks when cluster is busy.
> 
>
> Key: HDFS-16922
> URL: https://issues.apache.org/jira/browse/HDFS-16922
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: ZhangHB
>Priority: Major
>  Labels: pull-request-available
>
> The current logic of IncrementalBlockReportManager# addRDBI method could lead 
> to the missing blocks when datanodes in pipeline are I/O busy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16898) Remove write lock for processCommandFromActor of DataNode to reduce impact on heartbeat

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689648#comment-17689648
 ] 

ASF GitHub Bot commented on HDFS-16898:
---

hadoop-yetus commented on PR #5408:
URL: https://github.com/apache/hadoop/pull/5408#issuecomment-1432800673

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  2s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ branch-3.3 Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  45m  2s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  compile  |   1m 25s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  checkstyle  |   1m  1s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  mvnsite  |   1m 36s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  javadoc  |   1m 42s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  spotbugs  |   3m 51s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  shadedclient  |  32m 48s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 24s |  |  the patch passed  |
   | +1 :green_heart: |  spotbugs  |   3m 40s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  32m 59s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 233m  6s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5408/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 43s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 362m 56s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.TestAuditLogger |
   |   | hadoop.hdfs.server.namenode.TestFsck |
   |   | hadoop.hdfs.tools.TestDFSAdmin |
   |   | hadoop.hdfs.server.mover.TestMover |
   |   | hadoop.hdfs.server.namenode.TestAuditLogs |
   |   | hadoop.hdfs.TestRollingUpgrade |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.42 ServerAPI=1.42 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5408/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5408 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 71030960ce81 4.15.0-197-generic #208-Ubuntu SMP Tue Nov 1 
17:23:37 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | branch-3.3 / db600f1a3ef142390860ef96e2ae1a8f017031b2 |
   | Default Java | Private Build-1.8.0_352-8u352-ga-1~18.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5408/2/testReport/ |
   | Max. process+thread count | 2025 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5408/2/console |
   | versions | git=2.17.1 maven=3.6.0 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Remove write lock for processCommandFromActor of DataNode to reduce impact on 
> heartbeat
> ---
>
> Key: HDFS-16898
> URL: https://issues.apache.org/jira/browse/HDFS-16898
> Project: Hadoop HDFS
>  Issue Type: Impro

[jira] [Commented] (HDFS-16918) Optionally shut down datanode if it does not stay connected to active namenode

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689640#comment-17689640
 ] 

ASF GitHub Bot commented on HDFS-16918:
---

hadoop-yetus commented on PR #5396:
URL: https://github.com/apache/hadoop/pull/5396#issuecomment-1432794232

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 14s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  47m 21s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 35s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m  6s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 33s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  7s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   1m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 36s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  29m 15s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 30s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 22s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javac  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 16s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 55s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 21s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   0m 54s | 
[/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5396/3/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt)
 |  hadoop-hdfs in the patch failed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.  |
   | +1 :green_heart: |  javadoc  |   1m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 30s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  24m 14s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 241m 33s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5396/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 45s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 366m  1s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.TestAuditLogger |
   |   | hadoop.hdfs.server.namenode.TestFSNamesystemLockReport |
   |   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   |   | hadoop.hdfs.server.namenode.TestAuditLogs |
   |   | hadoop.hdfs.server.namenode.TestFsck |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.42 ServerAPI=1.42 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5396/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5396 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux 3a3903bc636b 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 
18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 91e7a726426d754ab4bfc548d1e9d51

[jira] [Commented] (HDFS-16898) Remove write lock for processCommandFromActor of DataNode to reduce impact on heartbeat

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689634#comment-17689634
 ] 

ASF GitHub Bot commented on HDFS-16898:
---

hadoop-yetus commented on PR #5408:
URL: https://github.com/apache/hadoop/pull/5408#issuecomment-1432771976

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 51s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ branch-3.3 Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  43m 11s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  compile  |   1m 13s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  checkstyle  |   0m 53s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  mvnsite  |   1m 22s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  spotbugs  |   3m 21s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  shadedclient  |  30m 41s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  9s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m  9s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 41s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  spotbugs  |   3m 19s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  30m 13s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 221m 19s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5408/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 41s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 342m  3s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.TestAuditLogs |
   |   | hadoop.hdfs.TestFileCreation |
   |   | hadoop.hdfs.server.balancer.TestBalancer |
   |   | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes |
   |   | hadoop.hdfs.TestRollingUpgrade |
   |   | hadoop.hdfs.server.namenode.TestFsck |
   |   | hadoop.hdfs.server.namenode.TestAuditLogger |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.42 ServerAPI=1.42 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5408/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5408 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 2e9e1e2e533d 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 
18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | branch-3.3 / db600f1a3ef142390860ef96e2ae1a8f017031b2 |
   | Default Java | Private Build-1.8.0_352-8u352-ga-1~18.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5408/1/testReport/ |
   | Max. process+thread count | 2360 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5408/1/console |
   | versions | git=2.17.1 maven=3.6.0 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Remove write lock for processCommandFromActor of DataNode to reduce impact on 
> heartbeat
> ---
>
> Key: HDFS-16898
> URL: https://issues.apache.org/jira/browse/HD

[jira] [Commented] (HDFS-16917) Add transfer rate quantile metrics for DataNode reads

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689621#comment-17689621
 ] 

ASF GitHub Bot commented on HDFS-16917:
---

hadoop-yetus commented on PR #5397:
URL: https://github.com/apache/hadoop/pull/5397#issuecomment-1432717633

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 50s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m 29s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  33m 57s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  25m 21s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  compile  |  21m 50s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   4m  5s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 23s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 15s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   2m 23s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   6m 15s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  29m 13s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 23s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  24m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javac  |  24m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  21m 50s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |  21m 50s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   3m 55s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5397/7/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 2 new + 130 unchanged - 0 fixed = 132 total (was 
130)  |
   | +1 :green_heart: |  mvnsite  |   3m 17s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m  9s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   2m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   6m 25s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  29m 43s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  18m 22s |  |  hadoop-common in the patch 
passed.  |
   | -1 :x: |  unit  | 228m 44s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5397/7/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 58s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 488m 54s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.TestAuditLogs |
   |   | hadoop.hdfs.server.namenode.TestFSNamesystemLockReport |
   |   | hadoop.hdfs.server.namenode.TestFsck |
   |   | hadoop.hdfs.server.namenode.ha.TestObserverNode |
   |   | hadoop.hdfs.TestRollingUpgrade |
   |   | hadoop.hdfs.server.namenode.TestAuditLogger |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.42 ServerAPI=1.42 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5397/7/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5397 |
   | Optional Tests | dupname asflicense mvnsite codespell detsecrets

[jira] [Commented] (HDFS-16896) HDFS Client hedged read has increased failure rate than without hedged read

2023-02-16 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17689611#comment-17689611
 ] 

ASF GitHub Bot commented on HDFS-16896:
---

hadoop-yetus commented on PR #5322:
URL: https://github.com/apache/hadoop/pull/5322#issuecomment-1432703908

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 52s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m 19s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  38m 29s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   6m 43s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  compile  |   6m 12s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 18s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 26s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 47s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   2m  4s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   6m 11s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  28m 55s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 22s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   6m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javac  |   6m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   6m 26s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   6m 26s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m 12s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/5/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 1 new + 31 unchanged - 0 fixed = 
32 total (was 31)  |
   | +1 :green_heart: |  mvnsite  |   2m 17s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   2m  0s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   6m 52s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  30m 40s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 20s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 241m 59s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 59s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 413m 31s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.TestAuditLogger |
   |   | hadoop.hdfs.server.namenode.TestFSNamesystemLockReport |
   |   | hadoop.hdfs.server.namenode.TestFsck |
   |   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   |   | hadoop.hdfs.server.namenode.TestAuditLogs |
   |   | hadoop.hdfs.TestPread |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.42 ServerAPI=1.42 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5322/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5322 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 62ba32aa466f 4.15.0-197-generic #208-Ubuntu SMP Tue Nov 1 
17:23:37 UTC 2022

47 matches

Mail list logo