[jira] [Commented] (HDFS-16322) The NameNode implementation of ClientProtocol.truncate(...) can cause data loss.

2021-11-14 Thread Tsz-wo Sze (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443570#comment-17443570
 ] 

Tsz-wo Sze commented on HDFS-16322:
---

> ... (NOTE: t0 t0.

It is correct if c1 succeeds but c0 gets FileAlreadyExistsException.


> The NameNode implementation of ClientProtocol.truncate(...) can cause data 
> loss.
> 
>
> Key: HDFS-16322
> URL: https://issues.apache.org/jira/browse/HDFS-16322
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: The runtime environment is Ubuntu 18.04, Java 1.8.0_222 
> and Apache Maven 3.6.0. 
> The bug can be reproduced by the the testMultipleTruncate() in the 
> attachment. First, replace the file TestFileTruncate.java under the directory 
> "hadoop-3.3.1-src/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/"
>  with the attachment. Then run "mvn test 
> -Dtest=org.apache.hadoop.hdfs.server.namenode.TestFileTruncate#testMultipleTruncate"
>  to run the testcase. Finally the "assertFileLength(p, n+newLength)" at 199 
> line of TestFileTruncate.java will abort. Because the retry of truncate() 
> changes the file size and cause data loss.
>Reporter: nhaorand
>Priority: Major
> Attachments: TestFileTruncate.java
>
>
> The NameNode implementation of ClientProtocol.truncate(...) can cause data 
> loss. If dfsclient drops the first response of a truncate RPC call, the retry 
> by retry cache will truncate the file again and cause data loss.
> HDFS-7926 avoids repeated execution of truncate(...) by checking if the file 
> is already being truncated with the same length. However, under concurrency, 
> after the first execution of truncate(...), concurrent requests from other 
> clients may append new data and change the file length. When truncate(...) is 
> retried after that, it will find the file has not been truncated with the 
> same length and truncate it again, which causes data loss.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16322) The NameNode implementation of ClientProtocol.truncate(...) can cause data loss.

2021-11-14 Thread Tsz-wo Sze (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443566#comment-17443566
 ] 

Tsz-wo Sze commented on HDFS-16322:
---

[~hexiaoqiao], The given case is not data loss.  As long as NN has not 
responded to Client A, NN may insert any operations before truncate.  This is 
the correct concurrent behavior.

> The NameNode implementation of ClientProtocol.truncate(...) can cause data 
> loss.
> 
>
> Key: HDFS-16322
> URL: https://issues.apache.org/jira/browse/HDFS-16322
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: The runtime environment is Ubuntu 18.04, Java 1.8.0_222 
> and Apache Maven 3.6.0. 
> The bug can be reproduced by the the testMultipleTruncate() in the 
> attachment. First, replace the file TestFileTruncate.java under the directory 
> "hadoop-3.3.1-src/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/"
>  with the attachment. Then run "mvn test 
> -Dtest=org.apache.hadoop.hdfs.server.namenode.TestFileTruncate#testMultipleTruncate"
>  to run the testcase. Finally the "assertFileLength(p, n+newLength)" at 199 
> line of TestFileTruncate.java will abort. Because the retry of truncate() 
> changes the file size and cause data loss.
>Reporter: nhaorand
>Priority: Major
> Attachments: TestFileTruncate.java
>
>
> The NameNode implementation of ClientProtocol.truncate(...) can cause data 
> loss. If dfsclient drops the first response of a truncate RPC call, the retry 
> by retry cache will truncate the file again and cause data loss.
> HDFS-7926 avoids repeated execution of truncate(...) by checking if the file 
> is already being truncated with the same length. However, under concurrency, 
> after the first execution of truncate(...), concurrent requests from other 
> clients may append new data and change the file length. When truncate(...) is 
> retried after that, it will find the file has not been truncated with the 
> same length and truncate it again, which causes data loss.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16322) The NameNode implementation of ClientProtocol.truncate(...) can cause data loss.

2021-11-14 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443550#comment-17443550
 ] 

Xiaoqiao He commented on HDFS-16322:


Thanks [~szetszwo] for your response. Considering the following case:
A. Client A request truncate for file foo at time t0.
B. NameNode executed truncate request and response to Client A at time t1. ​
C. Client B request truncate and append for file foo and file length not 
changes but content changes actually at time t2.
D. Client A not receive response and retry it (end user can not dominate this 
action), NameNode will re-execute it because no RetryCache for truncate request 
then response at t3.
After t3, the file content will not be expected. (NOTE: t0 The NameNode implementation of ClientProtocol.truncate(...) can cause data 
> loss.
> 
>
> Key: HDFS-16322
> URL: https://issues.apache.org/jira/browse/HDFS-16322
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: The runtime environment is Ubuntu 18.04, Java 1.8.0_222 
> and Apache Maven 3.6.0. 
> The bug can be reproduced by the the testMultipleTruncate() in the 
> attachment. First, replace the file TestFileTruncate.java under the directory 
> "hadoop-3.3.1-src/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/"
>  with the attachment. Then run "mvn test 
> -Dtest=org.apache.hadoop.hdfs.server.namenode.TestFileTruncate#testMultipleTruncate"
>  to run the testcase. Finally the "assertFileLength(p, n+newLength)" at 199 
> line of TestFileTruncate.java will abort. Because the retry of truncate() 
> changes the file size and cause data loss.
>Reporter: nhaorand
>Priority: Major
> Attachments: TestFileTruncate.java
>
>
> The NameNode implementation of ClientProtocol.truncate(...) can cause data 
> loss. If dfsclient drops the first response of a truncate RPC call, the retry 
> by retry cache will truncate the file again and cause data loss.
> HDFS-7926 avoids repeated execution of truncate(...) by checking if the file 
> is already being truncated with the same length. However, under concurrency, 
> after the first execution of truncate(...), concurrent requests from other 
> clients may append new data and change the file length. When truncate(...) is 
> retried after that, it will find the file has not been truncated with the 
> same length and truncate it again, which causes data loss.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16315) Add metrics related to Transfer and NativeCopy for DataNode

2021-11-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16315?focusedWorklogId=681327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-681327
 ]

ASF GitHub Bot logged work on HDFS-16315:
-

Author: ASF GitHub Bot
Created on: 15/Nov/21 04:59
Start Date: 15/Nov/21 04:59
Worklog Time Spent: 10m 
  Work Description: tomscut commented on pull request #3643:
URL: https://github.com/apache/hadoop/pull/3643#issuecomment-968539533


   > Changes LGTM.
   
   Thank @ayushtkn for your review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 681327)
Time Spent: 2h 40m  (was: 2.5h)

> Add metrics related to Transfer and NativeCopy for DataNode
> ---
>
> Key: HDFS-16315
> URL: https://issues.apache.org/jira/browse/HDFS-16315
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-11-11-08-26-33-074.png
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Datanodes already have Read, Write, Sync and Flush metrics. We should add 
> NativeCopy and Transfer as well.
> Here is a partial look after the change:
> !image-2021-11-11-08-26-33-074.png|width=205,height=235!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16322) The NameNode implementation of ClientProtocol.truncate(...) can cause data loss.

2021-11-14 Thread Tsz-wo Sze (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443527#comment-17443527
 ] 

Tsz-wo Sze commented on HDFS-16322:
---

> ... However, under concurrency, after the first execution of truncate(...), 
> concurrent requests from other clients may append new data and change the 
> file length. When truncate(...) is retried after that, it will find the file 
> has not been truncated with the same length and truncate it again, which 
> causes data loss.

This is not a data loss case.  For concurrent truncate and append, NN can 
execute them in any order.  This case just becomes first append and then 
truncate.

> The NameNode implementation of ClientProtocol.truncate(...) can cause data 
> loss.
> 
>
> Key: HDFS-16322
> URL: https://issues.apache.org/jira/browse/HDFS-16322
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: The runtime environment is Ubuntu 18.04, Java 1.8.0_222 
> and Apache Maven 3.6.0. 
> The bug can be reproduced by the the testMultipleTruncate() in the 
> attachment. First, replace the file TestFileTruncate.java under the directory 
> "hadoop-3.3.1-src/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/"
>  with the attachment. Then run "mvn test 
> -Dtest=org.apache.hadoop.hdfs.server.namenode.TestFileTruncate#testMultipleTruncate"
>  to run the testcase. Finally the "assertFileLength(p, n+newLength)" at 199 
> line of TestFileTruncate.java will abort. Because the retry of truncate() 
> changes the file size and cause data loss.
>Reporter: nhaorand
>Priority: Major
> Attachments: TestFileTruncate.java
>
>
> The NameNode implementation of ClientProtocol.truncate(...) can cause data 
> loss. If dfsclient drops the first response of a truncate RPC call, the retry 
> by retry cache will truncate the file again and cause data loss.
> HDFS-7926 avoids repeated execution of truncate(...) by checking if the file 
> is already being truncated with the same length. However, under concurrency, 
> after the first execution of truncate(...), concurrent requests from other 
> clients may append new data and change the file length. When truncate(...) is 
> retried after that, it will find the file has not been truncated with the 
> same length and truncate it again, which causes data loss.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16323) DatanodeHttpServer doesn't require handler state map while retrieving filter handlers

2021-11-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16323?focusedWorklogId=681306=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-681306
 ]

ASF GitHub Bot logged work on HDFS-16323:
-

Author: ASF GitHub Bot
Created on: 15/Nov/21 02:39
Start Date: 15/Nov/21 02:39
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3659:
URL: https://github.com/apache/hadoop/pull/3659#issuecomment-968464431


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 12s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  35m 14s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 58s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 21s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 24s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 22s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m 18s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  7s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   1m  7s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 52s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3659/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 0 unchanged - 
0 fixed = 1 total (was 0)  |
   | +1 :green_heart: |  mvnsite  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 51s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 22s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 21s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m 25s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 363m 29s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3659/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 45s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 470m 44s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.fs.viewfs.TestViewFileSystemOverloadSchemeHdfsFileSystemContract |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3659/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3659 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux e8a6e1732a52 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 
16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 4ce8882038f2e4a06b17ccc8ae65eac0dd277bb9 |
   | Default Java | Private 

[jira] [Work logged] (HDFS-16315) Add metrics related to Transfer and NativeCopy for DataNode

2021-11-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16315?focusedWorklogId=681296=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-681296
 ]

ASF GitHub Bot logged work on HDFS-16315:
-

Author: ASF GitHub Bot
Created on: 15/Nov/21 01:40
Start Date: 15/Nov/21 01:40
Worklog Time Spent: 10m 
  Work Description: tomscut commented on pull request #3643:
URL: https://github.com/apache/hadoop/pull/3643#issuecomment-968433719


   The failed unit test is unrelated to the change. And work fine locally.
   
   @tasanuma @ayushtkn Please take a look. Thank you very much.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 681296)
Time Spent: 2.5h  (was: 2h 20m)

> Add metrics related to Transfer and NativeCopy for DataNode
> ---
>
> Key: HDFS-16315
> URL: https://issues.apache.org/jira/browse/HDFS-16315
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-11-11-08-26-33-074.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Datanodes already have Read, Write, Sync and Flush metrics. We should add 
> NativeCopy and Transfer as well.
> Here is a partial look after the change:
> !image-2021-11-11-08-26-33-074.png|width=205,height=235!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16315) Add metrics related to Transfer and NativeCopy for DataNode

2021-11-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16315?focusedWorklogId=681294=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-681294
 ]

ASF GitHub Bot logged work on HDFS-16315:
-

Author: ASF GitHub Bot
Created on: 15/Nov/21 01:26
Start Date: 15/Nov/21 01:26
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3643:
URL: https://github.com/apache/hadoop/pull/3643#issuecomment-968428495


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  16m 47s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  12m 53s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  21m 23s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  21m 43s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  18m 58s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   3m 42s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 10s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 17s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   3m 19s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 42s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 38s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 27s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  7s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  21m  0s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  21m  0s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 56s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  18m 56s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   3m 33s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   3m 12s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m 16s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   3m 20s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   6m  6s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 47s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m 21s |  |  hadoop-common in the patch 
passed.  |
   | -1 :x: |  unit  | 376m  9s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3643/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m  7s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 610m 38s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.web.TestWebHdfsFileSystemContract |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3643/6/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3643 |
   | Optional Tests | dupname asflicense mvnsite codespell markdownlint compile 
javac javadoc mvninstall unit shadedclient spotbugs checkstyle |
   | uname | Linux af9395139069 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / cdad4d553862c1b0e31a3a470c313c11f44df1d5 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 

[jira] [Work logged] (HDFS-16323) DatanodeHttpServer doesn't require handler state map while retrieving filter handlers

2021-11-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16323?focusedWorklogId=681255=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-681255
 ]

ASF GitHub Bot logged work on HDFS-16323:
-

Author: ASF GitHub Bot
Created on: 14/Nov/21 18:46
Start Date: 14/Nov/21 18:46
Worklog Time Spent: 10m 
  Work Description: virajjasani opened a new pull request #3659:
URL: https://github.com/apache/hadoop/pull/3659


   ### Description of PR
   DatanodeHttpServer#getFilterHandlers use handler state map just to query if 
the given datanode httpserver filter handler class exists in the map and if 
not, initialize the Channel handler by invoking specific parameterized 
constructor of the class. However, this handler state map is never used to 
upsert any data.
   
   
   ### How was this patch tested?
   Local testing with mini cluster.
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 681255)
Remaining Estimate: 0h
Time Spent: 10m

> DatanodeHttpServer doesn't require handler state map while retrieving filter 
> handlers
> -
>
> Key: HDFS-16323
> URL: https://issues.apache.org/jira/browse/HDFS-16323
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> DatanodeHttpServer#getFilterHandlers use handler state map just to query if 
> the given datanode httpserver filter handler class exists in the map and if 
> not, initialize the Channel handler by invoking specific parameterized 
> constructor of the class. However, this handler state map is never used to 
> upsert any data.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16323) DatanodeHttpServer doesn't require handler state map while retrieving filter handlers

2021-11-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16323:
--
Labels: pull-request-available  (was: )

> DatanodeHttpServer doesn't require handler state map while retrieving filter 
> handlers
> -
>
> Key: HDFS-16323
> URL: https://issues.apache.org/jira/browse/HDFS-16323
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> DatanodeHttpServer#getFilterHandlers use handler state map just to query if 
> the given datanode httpserver filter handler class exists in the map and if 
> not, initialize the Channel handler by invoking specific parameterized 
> constructor of the class. However, this handler state map is never used to 
> upsert any data.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16323) DatanodeHttpServer doesn't require handler state map while retrieving filter handlers

2021-11-14 Thread Viraj Jasani (Jira)
Viraj Jasani created HDFS-16323:
---

 Summary: DatanodeHttpServer doesn't require handler state map 
while retrieving filter handlers
 Key: HDFS-16323
 URL: https://issues.apache.org/jira/browse/HDFS-16323
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Viraj Jasani
Assignee: Viraj Jasani


DatanodeHttpServer#getFilterHandlers use handler state map just to query if the 
given datanode httpserver filter handler class exists in the map and if not, 
initialize the Channel handler by invoking specific parameterized constructor 
of the class. However, this handler state map is never used to upsert any data.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16318) Add exception blockinfo

2021-11-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16318?focusedWorklogId=681238=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-681238
 ]

ASF GitHub Bot logged work on HDFS-16318:
-

Author: ASF GitHub Bot
Created on: 14/Nov/21 15:15
Start Date: 14/Nov/21 15:15
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3649:
URL: https://github.com/apache/hadoop/pull/3649#issuecomment-968309570


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 21s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  35m  8s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m  0s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 53s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 26s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 56s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 36s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  24m 16s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 49s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 52s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 52s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 43s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 43s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 18s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-client.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3649/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-client.txt)
 |  hadoop-hdfs-project/hadoop-hdfs-client: The patch generated 8 new + 31 
unchanged - 0 fixed = 39 total (was 31)  |
   | +1 :green_heart: |  mvnsite  |   0m 48s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | -1 :x: |  spotbugs  |   2m 40s | 
[/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3649/2/artifact/out/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client.html)
 |  hadoop-hdfs-project/hadoop-hdfs-client generated 1 new + 0 unchanged - 0 
fixed = 1 total (was 0)  |
   | +1 :green_heart: |  shadedclient  |  24m 32s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 15s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  asflicense  |   0m 30s | 
[/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3649/2/artifact/out/results-asflicense.txt)
 |  The patch generated 1 ASF License warnings.  |
   |  |   | 101m  7s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | SpotBugs | module:hadoop-hdfs-project/hadoop-hdfs-client |
   |  |  Inconsistent synchronization of 
org.apache.hadoop.hdfs.DFSInputStream.currentNode; locked 94% of time  
Unsynchronized access at DFSInputStream.java:94% of time  Unsynchronized access 
at DFSInputStream.java:[line 260] |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3649/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3649 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | 

[jira] [Work logged] (HDFS-16315) Add metrics related to Transfer and NativeCopy for DataNode

2021-11-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16315?focusedWorklogId=681237=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-681237
 ]

ASF GitHub Bot logged work on HDFS-16315:
-

Author: ASF GitHub Bot
Created on: 14/Nov/21 15:10
Start Date: 14/Nov/21 15:10
Worklog Time Spent: 10m 
  Work Description: tomscut commented on a change in pull request #3643:
URL: https://github.com/apache/hadoop/pull/3643#discussion_r748867878



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
##
@@ -1830,4 +1830,58 @@ public void testReleaseVolumeRefIfExceptionThrown() 
throws IOException {
   cluster.shutdown();
 }
   }
+
+  @Test(timeout = 3)
+  public void testTransferAndNativeCopyMetrics() {
+Configuration config = new HdfsConfiguration();
+config.setInt(
+DFSConfigKeys.DFS_DATANODE_FILEIO_PROFILING_SAMPLING_PERCENTAGE_KEY,
+100);
+config.set(DFSConfigKeys.DFS_METRICS_PERCENTILES_INTERVALS_KEY,
+"60,300,1500");
+MiniDFSCluster cluster = null;
+try {
+  cluster = new MiniDFSCluster.Builder(config)
+  .numDataNodes(1)
+  .storageTypes(new StorageType[]{StorageType.DISK, StorageType.DISK})
+  .storagesPerDatanode(2)
+  .build();
+  FileSystem fs = cluster.getFileSystem();
+  DataNode dataNode = cluster.getDataNodes().get(0);
+
+  // Create file that has one block with one replica.
+  Path filePath = new Path(name.getMethodName());
+  DFSTestUtil.createFile(fs, filePath, 100, (short) 1, 0);
+  ExtendedBlock block = DFSTestUtil.getFirstBlock(fs, filePath);
+
+  // Copy a new replica to other volume.
+  FsDatasetImpl fsDataSetImpl = (FsDatasetImpl) dataNode.getFSDataset();
+  ReplicaInfo newReplicaInfo = createNewReplicaObj(block, fsDataSetImpl);
+  fsDataSetImpl.finalizeNewReplica(newReplicaInfo, block);
+
+  // Get the volume where the original replica resides.
+  FsVolumeSpi volume = null;
+  for (FsVolumeSpi fsVolumeReference :
+  fsDataSetImpl.getFsVolumeReferences()) {
+if (!fsVolumeReference.getStorageID()
+.equals(newReplicaInfo.getStorageUuid())) {
+  volume = fsVolumeReference;
+}
+  }
+
+  // Assert metrics.
+  DataNodeVolumeMetrics metrics = volume.getMetrics();
+  assertEquals(2, metrics.getTransferIoSampleCount());
+  assertEquals(3, metrics.getTransferIoQuantiles().length);
+  assertEquals(2, metrics.getNativeCopyIoSampleCount());
+  assertEquals(3, metrics.getNativeCopyIoQuantiles().length);
+} catch (Exception ex) {
+  LOG.info("Exception in testTransferAndNativeCopyMetrics ", ex);
+  fail("MoveBlock operation should succeed");

Review comment:
   Thanks @ayushtkn for your advice, I will fix it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 681237)
Time Spent: 2h 10m  (was: 2h)

> Add metrics related to Transfer and NativeCopy for DataNode
> ---
>
> Key: HDFS-16315
> URL: https://issues.apache.org/jira/browse/HDFS-16315
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-11-11-08-26-33-074.png
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Datanodes already have Read, Write, Sync and Flush metrics. We should add 
> NativeCopy and Transfer as well.
> Here is a partial look after the change:
> !image-2021-11-11-08-26-33-074.png|width=205,height=235!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16315) Add metrics related to Transfer and NativeCopy for DataNode

2021-11-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16315?focusedWorklogId=681236=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-681236
 ]

ASF GitHub Bot logged work on HDFS-16315:
-

Author: ASF GitHub Bot
Created on: 14/Nov/21 15:09
Start Date: 14/Nov/21 15:09
Worklog Time Spent: 10m 
  Work Description: tomscut commented on a change in pull request #3643:
URL: https://github.com/apache/hadoop/pull/3643#discussion_r748867659



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
##
@@ -1237,7 +1238,6 @@ public void testMoveBlockSuccess() {
   FsDatasetImpl fsDataSetImpl = (FsDatasetImpl) dataNode.getFSDataset();
   ReplicaInfo newReplicaInfo = createNewReplicaObj(block, fsDataSetImpl);
   fsDataSetImpl.finalizeNewReplica(newReplicaInfo, block);
-

Review comment:
   I'm sorry. It's my bad.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 681236)
Time Spent: 2h  (was: 1h 50m)

> Add metrics related to Transfer and NativeCopy for DataNode
> ---
>
> Key: HDFS-16315
> URL: https://issues.apache.org/jira/browse/HDFS-16315
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-11-11-08-26-33-074.png
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Datanodes already have Read, Write, Sync and Flush metrics. We should add 
> NativeCopy and Transfer as well.
> Here is a partial look after the change:
> !image-2021-11-11-08-26-33-074.png|width=205,height=235!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16319) Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount

2021-11-14 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443357#comment-17443357
 ] 

Ayush Saxena commented on HDFS-16319:
-

Committed to trunk and branch-3.3/

Thanx [~tomscut] for the contribution!!!

> Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount
> 
>
> Key: HDFS-16319
> URL: https://issues.apache.org/jira/browse/HDFS-16319
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount. See 
> [HDFS-15808|https://issues.apache.org/jira/browse/HDFS-15808].



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16315) Add metrics related to Transfer and NativeCopy for DataNode

2021-11-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16315?focusedWorklogId=681226=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-681226
 ]

ASF GitHub Bot logged work on HDFS-16315:
-

Author: ASF GitHub Bot
Created on: 14/Nov/21 14:51
Start Date: 14/Nov/21 14:51
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #3643:
URL: https://github.com/apache/hadoop/pull/3643#discussion_r748865034



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
##
@@ -1237,7 +1238,6 @@ public void testMoveBlockSuccess() {
   FsDatasetImpl fsDataSetImpl = (FsDatasetImpl) dataNode.getFSDataset();
   ReplicaInfo newReplicaInfo = createNewReplicaObj(block, fsDataSetImpl);
   fsDataSetImpl.finalizeNewReplica(newReplicaInfo, block);
-

Review comment:
   nit: avoid this

##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
##
@@ -1830,4 +1830,58 @@ public void testReleaseVolumeRefIfExceptionThrown() 
throws IOException {
   cluster.shutdown();
 }
   }
+
+  @Test(timeout = 3)
+  public void testTransferAndNativeCopyMetrics() {
+Configuration config = new HdfsConfiguration();
+config.setInt(
+DFSConfigKeys.DFS_DATANODE_FILEIO_PROFILING_SAMPLING_PERCENTAGE_KEY,
+100);
+config.set(DFSConfigKeys.DFS_METRICS_PERCENTILES_INTERVALS_KEY,
+"60,300,1500");
+MiniDFSCluster cluster = null;
+try {
+  cluster = new MiniDFSCluster.Builder(config)
+  .numDataNodes(1)
+  .storageTypes(new StorageType[]{StorageType.DISK, StorageType.DISK})
+  .storagesPerDatanode(2)
+  .build();
+  FileSystem fs = cluster.getFileSystem();
+  DataNode dataNode = cluster.getDataNodes().get(0);
+
+  // Create file that has one block with one replica.
+  Path filePath = new Path(name.getMethodName());
+  DFSTestUtil.createFile(fs, filePath, 100, (short) 1, 0);
+  ExtendedBlock block = DFSTestUtil.getFirstBlock(fs, filePath);
+
+  // Copy a new replica to other volume.
+  FsDatasetImpl fsDataSetImpl = (FsDatasetImpl) dataNode.getFSDataset();
+  ReplicaInfo newReplicaInfo = createNewReplicaObj(block, fsDataSetImpl);
+  fsDataSetImpl.finalizeNewReplica(newReplicaInfo, block);
+
+  // Get the volume where the original replica resides.
+  FsVolumeSpi volume = null;
+  for (FsVolumeSpi fsVolumeReference :
+  fsDataSetImpl.getFsVolumeReferences()) {
+if (!fsVolumeReference.getStorageID()
+.equals(newReplicaInfo.getStorageUuid())) {
+  volume = fsVolumeReference;
+}
+  }
+
+  // Assert metrics.
+  DataNodeVolumeMetrics metrics = volume.getMetrics();
+  assertEquals(2, metrics.getTransferIoSampleCount());
+  assertEquals(3, metrics.getTransferIoQuantiles().length);
+  assertEquals(2, metrics.getNativeCopyIoSampleCount());
+  assertEquals(3, metrics.getNativeCopyIoQuantiles().length);
+} catch (Exception ex) {
+  LOG.info("Exception in testTransferAndNativeCopyMetrics ", ex);
+  fail("MoveBlock operation should succeed");

Review comment:
   No need to have a catch block, let the exception raised be propagated.
   You can even consider using try with resources for ``cluster = new 
MiniDFSCluster.Builder(config)``




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 681226)
Time Spent: 1h 50m  (was: 1h 40m)

> Add metrics related to Transfer and NativeCopy for DataNode
> ---
>
> Key: HDFS-16315
> URL: https://issues.apache.org/jira/browse/HDFS-16315
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-11-11-08-26-33-074.png
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Datanodes already have Read, Write, Sync and Flush metrics. We should add 
> NativeCopy and Transfer as well.
> Here is a partial look after the change:
> !image-2021-11-11-08-26-33-074.png|width=205,height=235!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: 

[jira] [Work logged] (HDFS-16319) Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount

2021-11-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16319?focusedWorklogId=681225=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-681225
 ]

ASF GitHub Bot logged work on HDFS-16319:
-

Author: ASF GitHub Bot
Created on: 14/Nov/21 14:48
Start Date: 14/Nov/21 14:48
Worklog Time Spent: 10m 
  Work Description: tomscut commented on pull request #3653:
URL: https://github.com/apache/hadoop/pull/3653#issuecomment-968304600


   > Thanx @tomscut for the contribution!
   
   Thanks @ayushtkn for the merge.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 681225)
Time Spent: 1h 10m  (was: 1h)

> Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount
> 
>
> Key: HDFS-16319
> URL: https://issues.apache.org/jira/browse/HDFS-16319
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount. See 
> [HDFS-15808|https://issues.apache.org/jira/browse/HDFS-15808].



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16319) Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount

2021-11-14 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-16319.
-
Fix Version/s: 3.4.0
   3.3.2
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount
> 
>
> Key: HDFS-16319
> URL: https://issues.apache.org/jira/browse/HDFS-16319
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount. See 
> [HDFS-15808|https://issues.apache.org/jira/browse/HDFS-15808].



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16319) Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount

2021-11-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16319?focusedWorklogId=681223=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-681223
 ]

ASF GitHub Bot logged work on HDFS-16319:
-

Author: ASF GitHub Bot
Created on: 14/Nov/21 14:40
Start Date: 14/Nov/21 14:40
Worklog Time Spent: 10m 
  Work Description: ayushtkn merged pull request #3653:
URL: https://github.com/apache/hadoop/pull/3653


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 681223)
Time Spent: 50m  (was: 40m)

> Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount
> 
>
> Key: HDFS-16319
> URL: https://issues.apache.org/jira/browse/HDFS-16319
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount. See 
> [HDFS-15808|https://issues.apache.org/jira/browse/HDFS-15808].



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16319) Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount

2021-11-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16319?focusedWorklogId=681224=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-681224
 ]

ASF GitHub Bot logged work on HDFS-16319:
-

Author: ASF GitHub Bot
Created on: 14/Nov/21 14:40
Start Date: 14/Nov/21 14:40
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on pull request #3653:
URL: https://github.com/apache/hadoop/pull/3653#issuecomment-968303352


   Thanx @tomscut for the contribution!!!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 681224)
Time Spent: 1h  (was: 50m)

> Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount
> 
>
> Key: HDFS-16319
> URL: https://issues.apache.org/jira/browse/HDFS-16319
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount. See 
> [HDFS-15808|https://issues.apache.org/jira/browse/HDFS-15808].



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16321) Fix invalid config in TestAvailableSpaceRackFaultTolerantBPP

2021-11-14 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-16321.
-
Fix Version/s: 3.4.0
   3.3.2
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Fix invalid config in TestAvailableSpaceRackFaultTolerantBPP 
> -
>
> Key: HDFS-16321
> URL: https://issues.apache.org/jira/browse/HDFS-16321
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.3.1
>Reporter: guo
>Assignee: guo
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> `TestAvailableSpaceRackFaultTolerantBPP` seems setting invalid param(valid in 
> `TestAvailableSpaceBlockPlacementPolicy`), we can fix it to avoid further 
> trouble.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16321) Fix invalid config in TestAvailableSpaceRackFaultTolerantBPP

2021-11-14 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443355#comment-17443355
 ] 

Ayush Saxena commented on HDFS-16321:
-

Committed to trunk and branch-3.3.

Thanx [~philipse] for the contribution!!!

> Fix invalid config in TestAvailableSpaceRackFaultTolerantBPP 
> -
>
> Key: HDFS-16321
> URL: https://issues.apache.org/jira/browse/HDFS-16321
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.3.1
>Reporter: guo
>Assignee: guo
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> `TestAvailableSpaceRackFaultTolerantBPP` seems setting invalid param(valid in 
> `TestAvailableSpaceBlockPlacementPolicy`), we can fix it to avoid further 
> trouble.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16321) Fix invalid config in TestAvailableSpaceRackFaultTolerantBPP

2021-11-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16321?focusedWorklogId=681222=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-681222
 ]

ASF GitHub Bot logged work on HDFS-16321:
-

Author: ASF GitHub Bot
Created on: 14/Nov/21 14:29
Start Date: 14/Nov/21 14:29
Worklog Time Spent: 10m 
  Work Description: ayushtkn merged pull request #3655:
URL: https://github.com/apache/hadoop/pull/3655


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 681222)
Time Spent: 1h 20m  (was: 1h 10m)

> Fix invalid config in TestAvailableSpaceRackFaultTolerantBPP 
> -
>
> Key: HDFS-16321
> URL: https://issues.apache.org/jira/browse/HDFS-16321
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.3.1
>Reporter: guo
>Assignee: guo
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> `TestAvailableSpaceRackFaultTolerantBPP` seems setting invalid param(valid in 
> `TestAvailableSpaceBlockPlacementPolicy`), we can fix it to avoid further 
> trouble.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-16321) Fix invalid config in TestAvailableSpaceRackFaultTolerantBPP

2021-11-14 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HDFS-16321:
---

Assignee: guo

> Fix invalid config in TestAvailableSpaceRackFaultTolerantBPP 
> -
>
> Key: HDFS-16321
> URL: https://issues.apache.org/jira/browse/HDFS-16321
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.3.1
>Reporter: guo
>Assignee: guo
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> `TestAvailableSpaceRackFaultTolerantBPP` seems setting invalid param(valid in 
> `TestAvailableSpaceBlockPlacementPolicy`), we can fix it to avoid further 
> trouble.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-16320) Datanode retrieve slownode information from NameNode

2021-11-14 Thread Janus Chow (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443351#comment-17443351
 ] 

Janus Chow edited comment on HDFS-16320 at 11/14/21, 2:13 PM:
--

[~hexiaoqiao] 
{quote}I mean that DataNode has the total information to decide if he is SLOW 
based on response time or throughput rather than based on command from 
NameNode. Furthermore there is possible to false positive at NameNode side.
{quote}
In my opinion, the slownode information of NameNode is kind of union choices 
from DataNodes. All the slownodes are reported by other DataNodes(calculated by 
statistics), and NameNode does the summary and chooses the top reported 
DataNodes. Till this part, the data of "slownode" should be much confident.
{quote}I am not against the idea but we should have more proper way to solve 
this problem.
{quote}
I tried to find other ways to find out the slownode, especially from DataNode 
themselves. But after I checked the implementation of "OutlierDetector.java" 
and "DataNodePeerMetrics.java", I think the current calculation is very good to 
spot the slownode.
{quote}client and DataNode/Pipeline communication could estimate if there are 
slow nodes and which one is slow
{quote}
Since in a pipeline, the client only talks to the first DataNode, it could be 
difficult to track the slowness for the slowness between the three DataNodes. I 
think that's why the slownode is only calculated on the penultimate node and 
the last node.

 

Another thing is in this ticket, it's kind of a slowness statement. Until now, 
the DataNode only shows the state of slowness tagged by each NameNode in the 
metrics. It's a kind of real-time status updated by heartbeat.


was (Author: symious):
{quote}I mean that DataNode has the total information to decide if he is SLOW 
based on response time or throughput rather than based on command from 
NameNode. Furthermore there is possible to false positive at NameNode side.
{quote}
In my opinion, the slownode information of NameNode is kind of union choices 
from DataNodes. All the slownodes are reported by other DataNodes(calculated by 
statistics), and NameNode does the summary and chooses the top reported 
DataNodes. Till this part, the data of "slownode" should be much confident.
{quote}I am not against the idea but we should have more proper way to solve 
this problem.
{quote}
I tried to find other ways to find out the slownode, especially from DataNode 
themselves. But after I checked the implementation of "OutlierDetector.java" 
and "DataNodePeerMetrics.java", I think the current calculation is very good to 
spot the slownode.
{quote}client and DataNode/Pipeline communication could estimate if there are 
slow nodes and which one is slow
{quote}
Since in a pipeline, the client only talks to the first DataNode, it could be 
difficult to track the slowness for the slowness between the three DataNodes. I 
think that's why the slownode is only calculated on the penultimate node and 
the last node.

 

Another thing is in this ticket, it's kind of a slowness statement. Until now, 
the DataNode only shows the state of slowness tagged by each NameNode in the 
metrics. It's a kind of real-time status updated by heartbeat.

> Datanode retrieve slownode information from NameNode
> 
>
> Key: HDFS-16320
> URL: https://issues.apache.org/jira/browse/HDFS-16320
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Janus Chow
>Assignee: Janus Chow
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current information of slownode is reported by reportingNode, and stored 
> in NameNode.
> This ticket is to let the slownode retrieve the information from NameNode, so 
> that it can do other performance improvement actions based on this 
> information.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16320) Datanode retrieve slownode information from NameNode

2021-11-14 Thread Janus Chow (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443351#comment-17443351
 ] 

Janus Chow commented on HDFS-16320:
---

{quote}I mean that DataNode has the total information to decide if he is SLOW 
based on response time or throughput rather than based on command from 
NameNode. Furthermore there is possible to false positive at NameNode side.
{quote}
In my opinion, the slownode information of NameNode is kind of union choices 
from DataNodes. All the slownodes are reported by other DataNodes(calculated by 
statistics), and NameNode does the summary and chooses the top reported 
DataNodes. Till this part, the data of "slownode" should be much confident.
{quote}I am not against the idea but we should have more proper way to solve 
this problem.
{quote}
I tried to find other ways to find out the slownode, especially from DataNode 
themselves. But after I checked the implementation of "OutlierDetector.java" 
and "DataNodePeerMetrics.java", I think the current calculation is very good to 
spot the slownode.
{quote}client and DataNode/Pipeline communication could estimate if there are 
slow nodes and which one is slow
{quote}
Since in a pipeline, the client only talks to the first DataNode, it could be 
difficult to track the slowness for the slowness between the three DataNodes. I 
think that's why the slownode is only calculated on the penultimate node and 
the last node.

 

Another thing is in this ticket, it's kind of a slowness statement. Until now, 
the DataNode only shows the state of slowness tagged by each NameNode in the 
metrics. It's a kind of real-time status updated by heartbeat.

> Datanode retrieve slownode information from NameNode
> 
>
> Key: HDFS-16320
> URL: https://issues.apache.org/jira/browse/HDFS-16320
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Janus Chow
>Assignee: Janus Chow
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current information of slownode is reported by reportingNode, and stored 
> in NameNode.
> This ticket is to let the slownode retrieve the information from NameNode, so 
> that it can do other performance improvement actions based on this 
> information.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16318) Add exception blockinfo

2021-11-14 Thread guo (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443341#comment-17443341
 ] 

guo commented on HDFS-16318:


Thanks [~hexiaoqiao]  for your note, have just updated

> Add exception blockinfo
> ---
>
> Key: HDFS-16318
> URL: https://issues.apache.org/jira/browse/HDFS-16318
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.3.1
>Reporter: guo
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> we may suffer `Could not obtain the last block location` exception, but we 
> may reading more than one file, the following exception cannnot guide us to 
> find the problem block or dn info.  we can add more info in the log to help 
> us .
> `2021-11-12 14:01:59,633 WARN [main] org.apache.hadoop.hdfs.DFSClient: Last 
> block locations not available. Datanodes might not have reported blocks 
> completely. Will retry for 3 times`
> `2021-11-12 14:02:03,724 WARN [main] org.apache.hadoop.hdfs.DFSClient: Last 
> block locations not available. Datanodes might not have reported blocks 
> completely. Will retry for 2 times`
> `2021-11-12 14:02:07,726 WARN [main] org.apache.hadoop.hdfs.DFSClient: Last 
> block locations not available. Datanodes might not have reported blocks 
> completely. Will retry for 1 times`
> `Caused by: java.lang.reflect.InvocationTargetException
>     at sun.reflect.GeneratedConstructorAccessor19.newInstance(Unknown Source)
>     at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>     at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:251)
>     ... 11 more`
> `Caused by: java.io.IOException: Could not obtain the last block locations.
>     at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:291)
>     at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:264)
>     at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1535)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:299)
>     at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:312)
>     at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:162)
>     at 
> org.apache.hadoop.fs.viewfs.ChRootedFileSystem.open(ChRootedFileSystem.java:261)
>     at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.open(ViewFileSystem.java:463)
>     at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:768)
>     at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:109)
>     at 
> org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
>     at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:66)
>     ... 15 more`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16318) Add exception blockinfo

2021-11-14 Thread guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guo updated HDFS-16318:
---
Description: 
we may suffer `Could not obtain the last block location` exception, but we may 
reading more than one file, the following exception cannnot guide us to find 
the problem block or dn info.  we can add more info in the log to help us .

`2021-11-12 14:01:59,633 WARN [main] org.apache.hadoop.hdfs.DFSClient: Last 
block locations not available. Datanodes might not have reported blocks 
completely. Will retry for 3 times`
`2021-11-12 14:02:03,724 WARN [main] org.apache.hadoop.hdfs.DFSClient: Last 
block locations not available. Datanodes might not have reported blocks 
completely. Will retry for 2 times`
`2021-11-12 14:02:07,726 WARN [main] org.apache.hadoop.hdfs.DFSClient: Last 
block locations not available. Datanodes might not have reported blocks 
completely. Will retry for 1 times`


`Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.GeneratedConstructorAccessor19.newInstance(Unknown Source)
    at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:251)
    ... 11 more`
`Caused by: java.io.IOException: Could not obtain the last block locations.
    at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:291)
    at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:264)
    at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1535)
    at 
org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304)
    at 
org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:299)
    at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:312)
    at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:162)
    at 
org.apache.hadoop.fs.viewfs.ChRootedFileSystem.open(ChRootedFileSystem.java:261)
    at org.apache.hadoop.fs.viewfs.ViewFileSystem.open(ViewFileSystem.java:463)
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:768)
    at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:109)
    at 
org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
    at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:66)
    ... 15 more`

> Add exception blockinfo
> ---
>
> Key: HDFS-16318
> URL: https://issues.apache.org/jira/browse/HDFS-16318
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.3.1
>Reporter: guo
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> we may suffer `Could not obtain the last block location` exception, but we 
> may reading more than one file, the following exception cannnot guide us to 
> find the problem block or dn info.  we can add more info in the log to help 
> us .
> `2021-11-12 14:01:59,633 WARN [main] org.apache.hadoop.hdfs.DFSClient: Last 
> block locations not available. Datanodes might not have reported blocks 
> completely. Will retry for 3 times`
> `2021-11-12 14:02:03,724 WARN [main] org.apache.hadoop.hdfs.DFSClient: Last 
> block locations not available. Datanodes might not have reported blocks 
> completely. Will retry for 2 times`
> `2021-11-12 14:02:07,726 WARN [main] org.apache.hadoop.hdfs.DFSClient: Last 
> block locations not available. Datanodes might not have reported blocks 
> completely. Will retry for 1 times`
> `Caused by: java.lang.reflect.InvocationTargetException
>     at sun.reflect.GeneratedConstructorAccessor19.newInstance(Unknown Source)
>     at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>     at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:251)
>     ... 11 more`
> `Caused by: java.io.IOException: Could not obtain the last block locations.
>     at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:291)
>     at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:264)
>     at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1535)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:299)
>     at 
> 

[jira] [Work logged] (HDFS-16318) Add exception blockinfo

2021-11-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16318?focusedWorklogId=681218=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-681218
 ]

ASF GitHub Bot logged work on HDFS-16318:
-

Author: ASF GitHub Bot
Created on: 14/Nov/21 13:36
Start Date: 14/Nov/21 13:36
Worklog Time Spent: 10m 
  Work Description: GuoPhilipse commented on a change in pull request #3649:
URL: https://github.com/apache/hadoop/pull/3649#discussion_r748855728



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
##
@@ -257,7 +257,7 @@ void openInfo(boolean refreshLocatedBlocks) throws 
IOException {
 // locations will not be available with NN for getting the length. Lets
 // retry for 3 times to get the length.
 if (lastBlockBeingWrittenLength == -1) {
-  DFSClient.LOG.warn("Last block locations not available. "
+  DFSClient.LOG.warn("Last block locations " + getCurrentBlock() + " 
not available. "

Review comment:
   Thanks @Hexiaoqiao for your review , just have updated `src` info in the 
log, but i have not idea on the log test cases ,Could you give me some advice ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 681218)
Time Spent: 40m  (was: 0.5h)

> Add exception blockinfo
> ---
>
> Key: HDFS-16318
> URL: https://issues.apache.org/jira/browse/HDFS-16318
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.3.1
>Reporter: guo
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16321) Fix invalid config in TestAvailableSpaceRackFaultTolerantBPP

2021-11-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16321?focusedWorklogId=681216=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-681216
 ]

ASF GitHub Bot logged work on HDFS-16321:
-

Author: ASF GitHub Bot
Created on: 14/Nov/21 13:10
Start Date: 14/Nov/21 13:10
Worklog Time Spent: 10m 
  Work Description: GuoPhilipse commented on pull request #3655:
URL: https://github.com/apache/hadoop/pull/3655#issuecomment-968288277


   @ayushtkn  Could you kindly help check the timeout cases, I have tried twice 
,the errors seem not related with this patch, and they occurs in different 
methods,
   `[ERROR] 
testResponseCode(org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract)  
Time elapsed: 30.014 s  <<< ERROR!
   org.junit.runners.model.TestTimedOutException: test timed out after 3 
milliseconds
at sun.misc.Unsafe.park(Native Method)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:426)
at java.util.concurrent.FutureTask.get(FutureTask.java:204)
at 
org.junit.internal.runners.statements.FailOnTimeout.getResult(FailOnTimeout.java:167)
at 
org.junit.internal.runners.statements.FailOnTimeout.evaluate(FailOnTimeout.java:128)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 681216)
Time Spent: 1h 10m  (was: 1h)

> Fix invalid config in TestAvailableSpaceRackFaultTolerantBPP 
> -
>
> Key: HDFS-16321
> URL: https://issues.apache.org/jira/browse/HDFS-16321
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.3.1
>Reporter: guo
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> `TestAvailableSpaceRackFaultTolerantBPP` seems setting invalid param(valid in 
> `TestAvailableSpaceBlockPlacementPolicy`), we can fix it to avoid further 
> trouble.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16320) Datanode retrieve slownode information from NameNode

2021-11-14 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443283#comment-17443283
 ] 

Xiaoqiao He commented on HDFS-16320:


Thanks [~Symious] for your quick response. It is reasonable case for me. I mean 
that DataNode has the total information to decide if he is SLOW based on 
response time or throughput rather than based on command from NameNode. 
Furthermore there is possible to false positive at NameNode side.
I am not against the idea but we should have more proper way to solve this 
problem. IMO, client and DataNode/Pipeline communication could estimate if 
there are slow nodes and which one is slow. FYI. Thanks.

> Datanode retrieve slownode information from NameNode
> 
>
> Key: HDFS-16320
> URL: https://issues.apache.org/jira/browse/HDFS-16320
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Janus Chow
>Assignee: Janus Chow
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current information of slownode is reported by reportingNode, and stored 
> in NameNode.
> This ticket is to let the slownode retrieve the information from NameNode, so 
> that it can do other performance improvement actions based on this 
> information.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16321) Fix invalid config in TestAvailableSpaceRackFaultTolerantBPP

2021-11-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16321?focusedWorklogId=681207=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-681207
 ]

ASF GitHub Bot logged work on HDFS-16321:
-

Author: ASF GitHub Bot
Created on: 14/Nov/21 09:42
Start Date: 14/Nov/21 09:42
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3655:
URL: https://github.com/apache/hadoop/pull/3655#issuecomment-968257217


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 55s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  37m 36s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 22s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 13s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 21s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 28s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 18s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m  6s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  7s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   1m  7s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 52s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3655/3/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 1 unchanged - 
1 fixed = 2 total (was 2)  |
   | +1 :green_heart: |  mvnsite  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 47s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 18s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 21s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m  6s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 354m 40s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3655/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 41s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 462m 56s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.web.TestWebHdfsFileSystemContract |
   |   | hadoop.hdfs.TestViewDistributedFileSystemContract |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3655/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3655 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux b1ad24666443 4.15.0-147-generic #151-Ubuntu SMP Fri Jun 18 
19:21:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / dcc21b479d735ebf5f8c4a07b4e6f6156d026f9f |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 

[jira] [Commented] (HDFS-16320) Datanode retrieve slownode information from NameNode

2021-11-14 Thread Janus Chow (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443276#comment-17443276
 ] 

Janus Chow commented on HDFS-16320:
---

[~hexiaoqiao]  Thank you for the review.

The issue we met is we have clients writing to the slownode and it took a very 
long time to finish writing for a normal file.

After we checked the metrics, we found we can avoid the pipeline creating on 
the slownodes with 
"dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled" set to true. 
It will work fine for new clients, but for clients already using the slownode 
as pipeline, they have to suffer the slownode. (Maybe the slownode is reported 
by this pipeline.)

Since when clients are writing data, it will only be clients and datanodes 
communicating, so even NameNode has the information that the datanode in the 
pipeline is slow, clients can do too much to avoid it. Our proposal would be, 
to let Datanodes get the information from heartbeats reports, then during the 
writing, datanodes can report it to clients, then clients can choose to rebuild 
the pipeline to improve the writing performance.

> Datanode retrieve slownode information from NameNode
> 
>
> Key: HDFS-16320
> URL: https://issues.apache.org/jira/browse/HDFS-16320
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Janus Chow
>Assignee: Janus Chow
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current information of slownode is reported by reportingNode, and stored 
> in NameNode.
> This ticket is to let the slownode retrieve the information from NameNode, so 
> that it can do other performance improvement actions based on this 
> information.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-16320) Datanode retrieve slownode information from NameNode

2021-11-14 Thread Janus Chow (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443276#comment-17443276
 ] 

Janus Chow edited comment on HDFS-16320 at 11/14/21, 9:41 AM:
--

[~hexiaoqiao]  Thank you for the review.

The issue we met is we have clients writing to the slownode and it took a very 
long time to finish writing for a normal file.

After we checked the metrics, we found we can avoid the pipeline creating on 
the slownodes with 
"dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled" set to true. 
It will work fine for new clients, but for clients already using the slownode 
as pipeline, they have to suffer the slownode. (Maybe the slownode is reported 
by this pipeline.)

Since when clients are writing data, it will only be clients and datanodes 
communicating, so even NameNode has the information that the datanode in the 
pipeline is slow, clients can not do too much to avoid it. Our proposal would 
be, to let Datanodes get the information from heartbeats reports, then during 
the writing, datanodes can report it to clients, then clients can choose to 
rebuild the pipeline to improve the writing performance.


was (Author: symious):
[~hexiaoqiao]  Thank you for the review.

The issue we met is we have clients writing to the slownode and it took a very 
long time to finish writing for a normal file.

After we checked the metrics, we found we can avoid the pipeline creating on 
the slownodes with 
"dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled" set to true. 
It will work fine for new clients, but for clients already using the slownode 
as pipeline, they have to suffer the slownode. (Maybe the slownode is reported 
by this pipeline.)

Since when clients are writing data, it will only be clients and datanodes 
communicating, so even NameNode has the information that the datanode in the 
pipeline is slow, clients can do too much to avoid it. Our proposal would be, 
to let Datanodes get the information from heartbeats reports, then during the 
writing, datanodes can report it to clients, then clients can choose to rebuild 
the pipeline to improve the writing performance.

> Datanode retrieve slownode information from NameNode
> 
>
> Key: HDFS-16320
> URL: https://issues.apache.org/jira/browse/HDFS-16320
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Janus Chow
>Assignee: Janus Chow
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current information of slownode is reported by reportingNode, and stored 
> in NameNode.
> This ticket is to let the slownode retrieve the information from NameNode, so 
> that it can do other performance improvement actions based on this 
> information.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16318) Add exception blockinfo

2021-11-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16318?focusedWorklogId=681204=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-681204
 ]

ASF GitHub Bot logged work on HDFS-16318:
-

Author: ASF GitHub Bot
Created on: 14/Nov/21 09:18
Start Date: 14/Nov/21 09:18
Worklog Time Spent: 10m 
  Work Description: Hexiaoqiao commented on a change in pull request #3649:
URL: https://github.com/apache/hadoop/pull/3649#discussion_r748825616



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
##
@@ -257,7 +257,7 @@ void openInfo(boolean refreshLocatedBlocks) throws 
IOException {
 // locations will not be available with NN for getting the length. Lets
 // retry for 3 times to get the length.
 if (lastBlockBeingWrittenLength == -1) {
-  DFSClient.LOG.warn("Last block locations not available. "
+  DFSClient.LOG.warn("Last block locations " + getCurrentBlock() + " 
not available. "

Review comment:
   Also add `src` will be more helpful IMO. Thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 681204)
Time Spent: 0.5h  (was: 20m)

> Add exception blockinfo
> ---
>
> Key: HDFS-16318
> URL: https://issues.apache.org/jira/browse/HDFS-16318
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.3.1
>Reporter: guo
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16318) Add exception blockinfo

2021-11-14 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443273#comment-17443273
 ] 

Xiaoqiao He commented on HDFS-16318:


[~philipse] would you mind to add some information about `Description` for what 
issue and how to fix. Thanks.

> Add exception blockinfo
> ---
>
> Key: HDFS-16318
> URL: https://issues.apache.org/jira/browse/HDFS-16318
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.3.1
>Reporter: guo
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16320) Datanode retrieve slownode information from NameNode

2021-11-14 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443268#comment-17443268
 ] 

Xiaoqiao He commented on HDFS-16320:


Thanks [~Symious] for your report and patch. IMO it is a little tricky for 
DataNode to get the slownode status from NameNode. In theory, DataNode has the 
total information to decide if it is slow by itself rather than following 
NameNode command. Would you mind to offer some more information about your plan 
to using this status? Thanks.

> Datanode retrieve slownode information from NameNode
> 
>
> Key: HDFS-16320
> URL: https://issues.apache.org/jira/browse/HDFS-16320
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Janus Chow
>Assignee: Janus Chow
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The current information of slownode is reported by reportingNode, and stored 
> in NameNode.
> This ticket is to let the slownode retrieve the information from NameNode, so 
> that it can do other performance improvement actions based on this 
> information.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16322) The NameNode implementation of ClientProtocol.truncate(...) can cause data loss.

2021-11-14 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17443263#comment-17443263
 ] 

Xiaoqiao He commented on HDFS-16322:


Thanks [~Nsupyq] for your report. It is interesting case. I am not sure if it 
is reasonable to disable `idempotent` at HDFS-7926, but it could cause data 
loss when client retry request for some network or other issues. IMO, it could 
be fixed when enable `idempotent` for truncate.

{quote}"idempotent" means applying the same operations multiple times will get 
the same result. If there is an append in the middle, the retry could get 
different results.

E.g. getPermission is idempotent. However, if there is a setPermission (or 
delete, rename, etc.) in the middle, the retry of getPermission could get a 
different result.{quote}
Just notice that [~szetszwo] leave this comment at HDFS-7926, Not sure if this 
explain is proper now. Such as Client A request `create` with overwrite 
operation and execute successful at NameNode side but not response to Client A, 
then it will retry. Before the retry request to NameNode, another Client B 
delete this file. Then retry request has invoked and return the last result 
because retry cache. It is the same case as `truncate`. 
cc [~shv], [~szetszwo] would you mind to give some suggestions? Thanks.

> The NameNode implementation of ClientProtocol.truncate(...) can cause data 
> loss.
> 
>
> Key: HDFS-16322
> URL: https://issues.apache.org/jira/browse/HDFS-16322
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: The runtime environment is Ubuntu 18.04, Java 1.8.0_222 
> and Apache Maven 3.6.0. 
> The bug can be reproduced by the the testMultipleTruncate() in the 
> attachment. First, replace the file TestFileTruncate.java under the directory 
> "hadoop-3.3.1-src/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/"
>  with the attachment. Then run "mvn test 
> -Dtest=org.apache.hadoop.hdfs.server.namenode.TestFileTruncate#testMultipleTruncate"
>  to run the testcase. Finally the "assertFileLength(p, n+newLength)" at 199 
> line of TestFileTruncate.java will abort. Because the retry of truncate() 
> changes the file size and cause data loss.
>Reporter: nhaorand
>Priority: Major
> Attachments: TestFileTruncate.java
>
>
> The NameNode implementation of ClientProtocol.truncate(...) can cause data 
> loss. If dfsclient drops the first response of a truncate RPC call, the retry 
> by retry cache will truncate the file again and cause data loss.
> HDFS-7926 avoids repeated execution of truncate(...) by checking if the file 
> is already being truncated with the same length. However, under concurrency, 
> after the first execution of truncate(...), concurrent requests from other 
> clients may append new data and change the file length. When truncate(...) is 
> retried after that, it will find the file has not been truncated with the 
> same length and truncate it again, which causes data loss.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org