[jira] [Commented] (HDFS-16657) Changing pool-level lock to volume-level lock for invalidation of blocks

2022-07-12 Thread Yuanbo Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566143#comment-17566143
 ] 

Yuanbo Liu commented on HDFS-16657:
---

Thanks for your comment.
Sounds reasonable, I'll try it.

> Changing pool-level lock to volume-level lock for invalidation of blocks
> 
>
> Key: HDFS-16657
> URL: https://issues.apache.org/jira/browse/HDFS-16657
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Yuanbo Liu
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-07-13-10-25-37-383.png, 
> image-2022-07-13-10-27-01-386.png, image-2022-07-13-10-27-44-258.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Recently we see that the heartbeating of dn become slow in a very busy 
> cluster, here is the chart:
> !image-2022-07-13-10-25-37-383.png|width=665,height=245!
>  
> After getting jstack of the dn, we find that dn heartbeat stuck in 
> invalidation of blocks:
> !image-2022-07-13-10-27-01-386.png|width=658,height=308!
> !image-2022-07-13-10-27-44-258.png|width=502,height=325!
> The key code is:
> {code:java}
> // code placeholder
> try {
>   File blockFile = new File(info.getBlockURI());
>   if (blockFile != null && blockFile.getParentFile() == null) {
> errors.add("Failed to delete replica " + invalidBlks[i]
> +  ". Parent not found for block file: " + blockFile);
> continue;
>   }
> } catch(IllegalArgumentException e) {
>   LOG.warn("Parent directory check failed; replica " + info
>   + " is not backed by a local file");
> } {code}
> DN is trying to locate parent path of block file, thus there is a disk I/O in 
> pool-level lock. When the disk becomes very busy with high io wait, All the 
> pending threads will be blocked by the pool-level lock, and the time of 
> heartbeat is high. We proposal to change the pool-level lock to volume-level 
> lock for block invalidation
> cc: [~hexiaoqiao] [~Aiphag0] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16657) Changing pool-level lock to volume-level lock for invalidation of blocks

2022-07-12 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566132#comment-17566132
 ] 

Xiaoqiao He commented on HDFS-16657:


[~yuanbo] Thanks for your proposal. IIRC, we has discussed this issue for a 
while. I think it is time to improve it.
For the furthermore improvement, we should consider acquire volume-level lock 
cost, such as process command to invalidate blocks, it is possible to batch 
them and reduce acquire lock frequently. I am not sure if other cases to block 
heartbeat and some other flow.
Anyway, would you like to contribute and improve it? Thanks again.

> Changing pool-level lock to volume-level lock for invalidation of blocks
> 
>
> Key: HDFS-16657
> URL: https://issues.apache.org/jira/browse/HDFS-16657
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Yuanbo Liu
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-07-13-10-25-37-383.png, 
> image-2022-07-13-10-27-01-386.png, image-2022-07-13-10-27-44-258.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Recently we see that the heartbeating of dn become slow in a very busy 
> cluster, here is the chart:
> !image-2022-07-13-10-25-37-383.png|width=665,height=245!
>  
> After getting jstack of the dn, we find that dn heartbeat stuck in 
> invalidation of blocks:
> !image-2022-07-13-10-27-01-386.png|width=658,height=308!
> !image-2022-07-13-10-27-44-258.png|width=502,height=325!
> The key code is:
> {code:java}
> // code placeholder
> try {
>   File blockFile = new File(info.getBlockURI());
>   if (blockFile != null && blockFile.getParentFile() == null) {
> errors.add("Failed to delete replica " + invalidBlks[i]
> +  ". Parent not found for block file: " + blockFile);
> continue;
>   }
> } catch(IllegalArgumentException e) {
>   LOG.warn("Parent directory check failed; replica " + info
>   + " is not backed by a local file");
> } {code}
> DN is trying to locate parent path of block file, thus there is a disk I/O in 
> pool-level lock. When the disk becomes very busy with high io wait, All the 
> pending threads will be blocked by the pool-level lock, and the time of 
> heartbeat is high. We proposal to change the pool-level lock to volume-level 
> lock for block invalidation
> cc: [~hexiaoqiao] [~Aiphag0] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16656) Fix some incorrect descriptions in SPS

2022-07-12 Thread Hongbing Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongbing Wang updated HDFS-16656:
-
Summary: Fix some incorrect descriptions in SPS  (was: Fixed some incorrect 
descriptions in SPS)

> Fix some incorrect descriptions in SPS
> --
>
> Key: HDFS-16656
> URL: https://issues.apache.org/jira/browse/HDFS-16656
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Hongbing Wang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There are some  incorrect descriptions in SPS module in web site, as follows: 
> [ArchivalStorage.md|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html]
>  and 
> [hdfs-default.xml|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml]
>  Fix them in `ArchivalStorage.md` and `hdfs-default.xml`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16657) Changing pool-level lock to volume-level lock for invalidation of blocks

2022-07-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16657?focusedWorklogId=790241=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-790241
 ]

ASF GitHub Bot logged work on HDFS-16657:
-

Author: ASF GitHub Bot
Created on: 13/Jul/22 03:41
Start Date: 13/Jul/22 03:41
Worklog Time Spent: 10m 
  Work Description: yuanboliu opened a new pull request, #4558:
URL: https://github.com/apache/hadoop/pull/4558

   The key code is:
   
   // code placeholder
   try {
 File blockFile = new File(info.getBlockURI());
 if (blockFile != null && blockFile.getParentFile() == null) {
   errors.add("Failed to delete replica " + invalidBlks[i]
   +  ". Parent not found for block file: " + blockFile);
   continue;
 }
   } catch(IllegalArgumentException e) {
 LOG.warn("Parent directory check failed; replica " + info
 + " is not backed by a local file");
   } 
   DN is trying to locate parent path of block file, thus there is a disk I/O 
in pool-level lock. When the disk becomes very busy with high io wait, All the 
pending threads will be blocked by the pool-level lock, and the time of 
heartbeat is high. We proposal to change the pool-level lock to volume-level 
lock for block invalidation




Issue Time Tracking
---

Worklog Id: (was: 790241)
Remaining Estimate: 0h
Time Spent: 10m

> Changing pool-level lock to volume-level lock for invalidation of blocks
> 
>
> Key: HDFS-16657
> URL: https://issues.apache.org/jira/browse/HDFS-16657
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Yuanbo Liu
>Priority: Major
> Attachments: image-2022-07-13-10-25-37-383.png, 
> image-2022-07-13-10-27-01-386.png, image-2022-07-13-10-27-44-258.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Recently we see that the heartbeating of dn become slow in a very busy 
> cluster, here is the chart:
> !image-2022-07-13-10-25-37-383.png|width=665,height=245!
>  
> After getting jstack of the dn, we find that dn heartbeat stuck in 
> invalidation of blocks:
> !image-2022-07-13-10-27-01-386.png|width=658,height=308!
> !image-2022-07-13-10-27-44-258.png|width=502,height=325!
> The key code is:
> {code:java}
> // code placeholder
> try {
>   File blockFile = new File(info.getBlockURI());
>   if (blockFile != null && blockFile.getParentFile() == null) {
> errors.add("Failed to delete replica " + invalidBlks[i]
> +  ". Parent not found for block file: " + blockFile);
> continue;
>   }
> } catch(IllegalArgumentException e) {
>   LOG.warn("Parent directory check failed; replica " + info
>   + " is not backed by a local file");
> } {code}
> DN is trying to locate parent path of block file, thus there is a disk I/O in 
> pool-level lock. When the disk becomes very busy with high io wait, All the 
> pending threads will be blocked by the pool-level lock, and the time of 
> heartbeat is high. We proposal to change the pool-level lock to volume-level 
> lock for block invalidation
> cc: [~hexiaoqiao] [~Aiphag0] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16657) Changing pool-level lock to volume-level lock for invalidation of blocks

2022-07-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16657:
--
Labels: pull-request-available  (was: )

> Changing pool-level lock to volume-level lock for invalidation of blocks
> 
>
> Key: HDFS-16657
> URL: https://issues.apache.org/jira/browse/HDFS-16657
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Yuanbo Liu
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-07-13-10-25-37-383.png, 
> image-2022-07-13-10-27-01-386.png, image-2022-07-13-10-27-44-258.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Recently we see that the heartbeating of dn become slow in a very busy 
> cluster, here is the chart:
> !image-2022-07-13-10-25-37-383.png|width=665,height=245!
>  
> After getting jstack of the dn, we find that dn heartbeat stuck in 
> invalidation of blocks:
> !image-2022-07-13-10-27-01-386.png|width=658,height=308!
> !image-2022-07-13-10-27-44-258.png|width=502,height=325!
> The key code is:
> {code:java}
> // code placeholder
> try {
>   File blockFile = new File(info.getBlockURI());
>   if (blockFile != null && blockFile.getParentFile() == null) {
> errors.add("Failed to delete replica " + invalidBlks[i]
> +  ". Parent not found for block file: " + blockFile);
> continue;
>   }
> } catch(IllegalArgumentException e) {
>   LOG.warn("Parent directory check failed; replica " + info
>   + " is not backed by a local file");
> } {code}
> DN is trying to locate parent path of block file, thus there is a disk I/O in 
> pool-level lock. When the disk becomes very busy with high io wait, All the 
> pending threads will be blocked by the pool-level lock, and the time of 
> heartbeat is high. We proposal to change the pool-level lock to volume-level 
> lock for block invalidation
> cc: [~hexiaoqiao] [~Aiphag0] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16657) Changing pool-level lock to volume-level lock for invalidation of blocks

2022-07-12 Thread Yuanbo Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanbo Liu updated HDFS-16657:
--
Description: 
Recently we see that the heartbeating of dn become slow in a very busy cluster, 
here is the chart:

!image-2022-07-13-10-25-37-383.png|width=665,height=245!

 

After getting jstack of the dn, we find that dn heartbeat stuck in invalidation 
of blocks:

!image-2022-07-13-10-27-01-386.png|width=658,height=308!

!image-2022-07-13-10-27-44-258.png|width=502,height=325!

The key code is:
{code:java}
// code placeholder
try {
  File blockFile = new File(info.getBlockURI());
  if (blockFile != null && blockFile.getParentFile() == null) {
errors.add("Failed to delete replica " + invalidBlks[i]
+  ". Parent not found for block file: " + blockFile);
continue;
  }
} catch(IllegalArgumentException e) {
  LOG.warn("Parent directory check failed; replica " + info
  + " is not backed by a local file");
} {code}
DN is trying to locate parent path of block file, thus there is a disk I/O in 
pool-level lock. When the disk becomes very busy with high io wait, All the 
pending threads will be blocked by the pool-level lock, and the time of 
heartbeat is high. We proposal to change the pool-level lock to volume-level 
lock for block invalidation

cc: [~hexiaoqiao] [~Aiphag0] 

  was:
Recently we see that the heartbeating of dn become slow in a very busy cluster, 
here is the chart:

!image-2022-07-13-10-25-37-383.png!

 

After getting jstack of the dn, we find that dn heartbeat stuck in invalidation 
of blocks:

!image-2022-07-13-10-27-01-386.png!

!image-2022-07-13-10-27-44-258.png!

The key code is:
{code:java}
// code placeholder
try {
  File blockFile = new File(info.getBlockURI());
  if (blockFile != null && blockFile.getParentFile() == null) {
errors.add("Failed to delete replica " + invalidBlks[i]
+  ". Parent not found for block file: " + blockFile);
continue;
  }
} catch(IllegalArgumentException e) {
  LOG.warn("Parent directory check failed; replica " + info
  + " is not backed by a local file");
} {code}
DN is trying to locate parent path of block file, thus there is a disk I/O in 
pool-level lock. When the disk becomes very busy with high io wait, All the 
pending threads will be blocked by the pool-level lock, and the time of 
heartbeat is high. We proposal to change the pool-level lock to volume-level 
lock for block invalidation

cc: [~hexiaoqiao] [~Aiphag0] 


> Changing pool-level lock to volume-level lock for invalidation of blocks
> 
>
> Key: HDFS-16657
> URL: https://issues.apache.org/jira/browse/HDFS-16657
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Yuanbo Liu
>Priority: Major
> Attachments: image-2022-07-13-10-25-37-383.png, 
> image-2022-07-13-10-27-01-386.png, image-2022-07-13-10-27-44-258.png
>
>
> Recently we see that the heartbeating of dn become slow in a very busy 
> cluster, here is the chart:
> !image-2022-07-13-10-25-37-383.png|width=665,height=245!
>  
> After getting jstack of the dn, we find that dn heartbeat stuck in 
> invalidation of blocks:
> !image-2022-07-13-10-27-01-386.png|width=658,height=308!
> !image-2022-07-13-10-27-44-258.png|width=502,height=325!
> The key code is:
> {code:java}
> // code placeholder
> try {
>   File blockFile = new File(info.getBlockURI());
>   if (blockFile != null && blockFile.getParentFile() == null) {
> errors.add("Failed to delete replica " + invalidBlks[i]
> +  ". Parent not found for block file: " + blockFile);
> continue;
>   }
> } catch(IllegalArgumentException e) {
>   LOG.warn("Parent directory check failed; replica " + info
>   + " is not backed by a local file");
> } {code}
> DN is trying to locate parent path of block file, thus there is a disk I/O in 
> pool-level lock. When the disk becomes very busy with high io wait, All the 
> pending threads will be blocked by the pool-level lock, and the time of 
> heartbeat is high. We proposal to change the pool-level lock to volume-level 
> lock for block invalidation
> cc: [~hexiaoqiao] [~Aiphag0] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16657) Changing pool-level lock to volume-level lock for invalidation of blocks

2022-07-12 Thread Yuanbo Liu (Jira)
Yuanbo Liu created HDFS-16657:
-

 Summary: Changing pool-level lock to volume-level lock for 
invalidation of blocks
 Key: HDFS-16657
 URL: https://issues.apache.org/jira/browse/HDFS-16657
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Yuanbo Liu
 Attachments: image-2022-07-13-10-25-37-383.png, 
image-2022-07-13-10-27-01-386.png, image-2022-07-13-10-27-44-258.png

Recently we see that the heartbeating of dn become slow in a very busy cluster, 
here is the chart:

!image-2022-07-13-10-25-37-383.png!

 

After getting jstack of the dn, we find that dn heartbeat stuck in invalidation 
of blocks:

!image-2022-07-13-10-27-01-386.png!

!image-2022-07-13-10-27-44-258.png!

The key code is:
{code:java}
// code placeholder
try {
  File blockFile = new File(info.getBlockURI());
  if (blockFile != null && blockFile.getParentFile() == null) {
errors.add("Failed to delete replica " + invalidBlks[i]
+  ". Parent not found for block file: " + blockFile);
continue;
  }
} catch(IllegalArgumentException e) {
  LOG.warn("Parent directory check failed; replica " + info
  + " is not backed by a local file");
} {code}
DN is trying to locate parent path of block file, thus there is a disk I/O in 
pool-level lock. When the disk becomes very busy with high io wait, All the 
pending threads will be blocked by the pool-level lock, and the time of 
heartbeat is high. We proposal to change the pool-level lock to volume-level 
lock for block invalidation

cc: [~hexiaoqiao] [~Aiphag0] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16656) Fixed some incorrect descriptions in SPS

2022-07-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16656?focusedWorklogId=790205=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-790205
 ]

ASF GitHub Bot logged work on HDFS-16656:
-

Author: ASF GitHub Bot
Created on: 12/Jul/22 22:04
Start Date: 12/Jul/22 22:04
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4556:
URL: https://github.com/apache/hadoop/pull/4556#issuecomment-1182541740

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 37s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  37m 37s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 43s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   1m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  mvnsite  |   1m 46s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 26s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 46s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  shadedclient  |  65m 41s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 23s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 29s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   1m 29s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 17s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  mvnsite  |   1m 23s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  shadedclient  |  24m 46s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 242m  9s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m 12s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 341m 24s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4556/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4556 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell detsecrets xmllint markdownlint |
   | uname | Linux a5f8022214df 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 
23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / cd382208945129096c443751b1fa652f1660404d |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4556/1/testReport/ |
   | Max. process+thread count | 3336 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4556/1/console |
   | versions | 

[jira] [Work logged] (HDFS-16656) Fixed some incorrect descriptions in SPS

2022-07-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16656?focusedWorklogId=790084=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-790084
 ]

ASF GitHub Bot logged work on HDFS-16656:
-

Author: ASF GitHub Bot
Created on: 12/Jul/22 16:20
Start Date: 12/Jul/22 16:20
Worklog Time Spent: 10m 
  Work Description: whbing opened a new pull request, #4556:
URL: https://github.com/apache/hadoop/pull/4556

   jira: https://issues.apache.org/jira/browse/HDFS-16656: Fixed some incorrect 
descriptions in SPS.




Issue Time Tracking
---

Worklog Id: (was: 790084)
Remaining Estimate: 0h
Time Spent: 10m

> Fixed some incorrect descriptions in SPS
> 
>
> Key: HDFS-16656
> URL: https://issues.apache.org/jira/browse/HDFS-16656
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Hongbing Wang
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There are some  incorrect descriptions in SPS module in web site, as follows: 
> [ArchivalStorage.md|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html]
>  and 
> [hdfs-default.xml|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml]
>  Fix them in `ArchivalStorage.md` and `hdfs-default.xml`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16656) Fixed some incorrect descriptions in SPS

2022-07-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16656:
--
Labels: pull-request-available  (was: )

> Fixed some incorrect descriptions in SPS
> 
>
> Key: HDFS-16656
> URL: https://issues.apache.org/jira/browse/HDFS-16656
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Hongbing Wang
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There are some  incorrect descriptions in SPS module in web site, as follows: 
> [ArchivalStorage.md|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html]
>  and 
> [hdfs-default.xml|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml]
>  Fix them in `ArchivalStorage.md` and `hdfs-default.xml`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16656) Fixed some incorrect descriptions in SPS

2022-07-12 Thread Hongbing Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongbing Wang updated HDFS-16656:
-
Description: There are some  incorrect descriptions in SPS module in web 
site, as follows: 
[ArchivalStorage.md|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html]
 and 
[hdfs-default.xml|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml]
 Fix them in `ArchivalStorage.md` and `hdfs-default.xml`.  (was: There are some 
 incorrect descriptions in SPS module in web site, as follows: 
[ArchivalStorage.md|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html]
 and 
[hdfs-default.xml|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml].]
 Fix them in `ArchivalStorage.md` and `hdfs-default.xml`.)

> Fixed some incorrect descriptions in SPS
> 
>
> Key: HDFS-16656
> URL: https://issues.apache.org/jira/browse/HDFS-16656
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Hongbing Wang
>Priority: Minor
>
> There are some  incorrect descriptions in SPS module in web site, as follows: 
> [ArchivalStorage.md|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html]
>  and 
> [hdfs-default.xml|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml]
>  Fix them in `ArchivalStorage.md` and `hdfs-default.xml`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16656) Fixed some incorrect descriptions in SPS

2022-07-12 Thread Hongbing Wang (Jira)
Hongbing Wang created HDFS-16656:


 Summary: Fixed some incorrect descriptions in SPS
 Key: HDFS-16656
 URL: https://issues.apache.org/jira/browse/HDFS-16656
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Hongbing Wang


There are some  incorrect descriptions in SPS module in web site, as follows: 
[ArchivalStorage.md|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html]
 and 
[hdfs-default.xml|[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml].]
 Fix them in `ArchivalStorage.md` and `hdfs-default.xml`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14084) Need for more stats in DFSClient

2022-07-12 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17565906#comment-17565906
 ] 

Masatake Iwasaki commented on HDFS-14084:
-

update the targets to 3.2.5 for preparing 3.2.4 release.

> Need for more stats in DFSClient
> 
>
> Key: HDFS-14084
> URL: https://issues.apache.org/jira/browse/HDFS-14084
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Pranay Singh
>Priority: Minor
> Attachments: HDFS-14084.001.patch, HDFS-14084.002.patch, 
> HDFS-14084.003.patch, HDFS-14084.004.patch, HDFS-14084.005.patch, 
> HDFS-14084.006.patch, HDFS-14084.007.patch, HDFS-14084.008.patch, 
> HDFS-14084.009.patch, HDFS-14084.010.patch, HDFS-14084.011.patch, 
> HDFS-14084.012.patch, HDFS-14084.013.patch, HDFS-14084.014.patch, 
> HDFS-14084.015.patch, HDFS-14084.016.patch, HDFS-14084.017.patch, 
> HDFS-14084.018.patch
>
>
> The usage of HDFS has changed from being used as a map-reduce filesystem, now 
> it's becoming more of like a general purpose filesystem. In most of the cases 
> there are issues with the Namenode so we have metrics to know the workload or 
> stress on Namenode.
> However, there is a need to have more statistics collected for different 
> operations/RPCs in DFSClient to know which RPC operations are taking longer 
> time or to know what is the frequency of the operation.These statistics can 
> be exposed to the users of DFS Client and they can periodically log or do 
> some sort of flow control if the response is slow. This will also help to 
> isolate HDFS issue in a mixed environment where on a node say we have Spark, 
> HBase and Impala running together. We can check the throughput of different 
> operation across client and isolate the problem caused because of noisy 
> neighbor or network congestion or shared JVM.
> We have dealt with several problems from the field for which there is no 
> conclusive evidence as to what caused the problem. If we had metrics or stats 
> in DFSClient we would be better equipped to solve such complex problems.
> List of jiras for reference:
> -
>  HADOOP-15538 HADOOP-15530 ( client side deadlock)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14084) Need for more stats in DFSClient

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-14084:

Target Version/s: 3.2.5  (was: 3.2.4)

> Need for more stats in DFSClient
> 
>
> Key: HDFS-14084
> URL: https://issues.apache.org/jira/browse/HDFS-14084
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Pranay Singh
>Priority: Minor
> Attachments: HDFS-14084.001.patch, HDFS-14084.002.patch, 
> HDFS-14084.003.patch, HDFS-14084.004.patch, HDFS-14084.005.patch, 
> HDFS-14084.006.patch, HDFS-14084.007.patch, HDFS-14084.008.patch, 
> HDFS-14084.009.patch, HDFS-14084.010.patch, HDFS-14084.011.patch, 
> HDFS-14084.012.patch, HDFS-14084.013.patch, HDFS-14084.014.patch, 
> HDFS-14084.015.patch, HDFS-14084.016.patch, HDFS-14084.017.patch, 
> HDFS-14084.018.patch
>
>
> The usage of HDFS has changed from being used as a map-reduce filesystem, now 
> it's becoming more of like a general purpose filesystem. In most of the cases 
> there are issues with the Namenode so we have metrics to know the workload or 
> stress on Namenode.
> However, there is a need to have more statistics collected for different 
> operations/RPCs in DFSClient to know which RPC operations are taking longer 
> time or to know what is the frequency of the operation.These statistics can 
> be exposed to the users of DFS Client and they can periodically log or do 
> some sort of flow control if the response is slow. This will also help to 
> isolate HDFS issue in a mixed environment where on a node say we have Spark, 
> HBase and Impala running together. We can check the throughput of different 
> operation across client and isolate the problem caused because of noisy 
> neighbor or network congestion or shared JVM.
> We have dealt with several problems from the field for which there is no 
> conclusive evidence as to what caused the problem. If we had metrics or stats 
> in DFSClient we would be better equipped to solve such complex problems.
> List of jiras for reference:
> -
>  HADOOP-15538 HADOOP-15530 ( client side deadlock)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14571) Command line to force volume failures

2022-07-12 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17565904#comment-17565904
 ] 

Masatake Iwasaki commented on HDFS-14571:
-

update the targets to 3.2.5 for preparing 3.2.4 release.

> Command line to force volume failures
> -
>
> Key: HDFS-14571
> URL: https://issues.apache.org/jira/browse/HDFS-14571
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, hdfs
> Environment: Linux
>Reporter: Scott A. Wehner
>Priority: Major
>  Labels: disks, volumes
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Datanodes that have failed hard drives reports to the namenode that it has a 
> failed volume in line with enabling slow datanode detection and we have a 
> failing drive that has not failed, or has uncorrectable sectors,  I want to 
> be able to run a command to force fail a datanode volume based on storageID 
> or Target Storage location (a.k.a mount point).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14349) Edit log may be rolled more frequently than necessary with multiple Standby nodes

2022-07-12 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17565905#comment-17565905
 ] 

Masatake Iwasaki commented on HDFS-14349:
-

update the targets to 3.2.5 for preparing 3.2.4 release.

> Edit log may be rolled more frequently than necessary with multiple Standby 
> nodes
> -
>
> Key: HDFS-14349
> URL: https://issues.apache.org/jira/browse/HDFS-14349
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, hdfs, qjm
>Reporter: Erik Krogen
>Assignee: Ekanth Sethuramalingam
>Priority: Major
>  Labels: multi-sbnn
>
> When HDFS-14317 was fixed, we tackled the problem that in a cluster with 
> in-progress edit log tailing enabled, a Standby NameNode may _never_ roll the 
> edit logs, which can eventually cause data loss.
> Unfortunately, in the process, it was made so that if there are multiple 
> Standby NameNodes, they will all roll the edit logs at their specified 
> frequency, so the edit log will be rolled X times more frequently than they 
> should be (where X is the number of Standby NNs). This is not as bad as the 
> original bug since rolling frequently does not affect correctness or data 
> availability, but may degrade performance by creating more edit log segments 
> than necessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15289) Allow viewfs mounts with HDFS/HCFS scheme and centralized mount table

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-15289:

Target Version/s: 3.4.0, 3.3.9, 3.2.5  (was: 3.4.0, 3.2.4, 3.3.9)

> Allow viewfs mounts with HDFS/HCFS scheme and centralized mount table
> -
>
> Key: HDFS-15289
> URL: https://issues.apache.org/jira/browse/HDFS-15289
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 3.2.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Attachments: ViewFSOverloadScheme - V1.0.pdf, ViewFSOverloadScheme.png
>
>
> ViewFS provides flexibility to mount different filesystem types with mount 
> points configuration table. This approach is solving the scalability 
> problems, but users need to reconfigure the filesystem to ViewFS and to its 
> scheme.  This will be problematic in the case of paths persisted in meta 
> stores, ex: Hive. In systems like Hive, it will store uris in meta store. So, 
> changing the file system scheme will create a burden to upgrade/recreate meta 
> stores. In our experience many users are not ready to change that.  
> Router based federation is another implementation to provide coordinated 
> mount points for HDFS federation clusters. Even though this provides 
> flexibility to handle mount points easily, this will not allow 
> other(non-HDFS) file systems to mount. So, this does not solve the purpose 
> when users want to mount external(non-HDFS) filesystems.
> So, the problem here is: Even though many users want to adapt to the scalable 
> fs options available, technical challenges of changing schemes (ex: in meta 
> stores) in deployments are obstructing them. 
> So, we propose to allow hdfs scheme in ViewFS like client side mount system 
> and provision user to create mount links without changing URI paths. 
> I will upload detailed design doc shortly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14571) Command line to force volume failures

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-14571:

Target Version/s: 3.2.5  (was: 3.2.4)

> Command line to force volume failures
> -
>
> Key: HDFS-14571
> URL: https://issues.apache.org/jira/browse/HDFS-14571
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, hdfs
> Environment: Linux
>Reporter: Scott A. Wehner
>Priority: Major
>  Labels: disks, volumes
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Datanodes that have failed hard drives reports to the namenode that it has a 
> failed volume in line with enabling slow datanode detection and we have a 
> failing drive that has not failed, or has uncorrectable sectors,  I want to 
> be able to run a command to force fail a datanode volume based on storageID 
> or Target Storage location (a.k.a mount point).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15289) Allow viewfs mounts with HDFS/HCFS scheme and centralized mount table

2022-07-12 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17565902#comment-17565902
 ] 

Masatake Iwasaki commented on HDFS-15289:
-

update the targets to 3.2.5 for preparing 3.2.4 release.

> Allow viewfs mounts with HDFS/HCFS scheme and centralized mount table
> -
>
> Key: HDFS-15289
> URL: https://issues.apache.org/jira/browse/HDFS-15289
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 3.2.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Attachments: ViewFSOverloadScheme - V1.0.pdf, ViewFSOverloadScheme.png
>
>
> ViewFS provides flexibility to mount different filesystem types with mount 
> points configuration table. This approach is solving the scalability 
> problems, but users need to reconfigure the filesystem to ViewFS and to its 
> scheme.  This will be problematic in the case of paths persisted in meta 
> stores, ex: Hive. In systems like Hive, it will store uris in meta store. So, 
> changing the file system scheme will create a burden to upgrade/recreate meta 
> stores. In our experience many users are not ready to change that.  
> Router based federation is another implementation to provide coordinated 
> mount points for HDFS federation clusters. Even though this provides 
> flexibility to handle mount points easily, this will not allow 
> other(non-HDFS) file systems to mount. So, this does not solve the purpose 
> when users want to mount external(non-HDFS) filesystems.
> So, the problem here is: Even though many users want to adapt to the scalable 
> fs options available, technical challenges of changing schemes (ex: in meta 
> stores) in deployments are obstructing them. 
> So, we propose to allow hdfs scheme in ViewFS like client side mount system 
> and provision user to create mount links without changing URI paths. 
> I will upload detailed design doc shortly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14349) Edit log may be rolled more frequently than necessary with multiple Standby nodes

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-14349:

Target Version/s: 3.2.5  (was: 3.2.4)

> Edit log may be rolled more frequently than necessary with multiple Standby 
> nodes
> -
>
> Key: HDFS-14349
> URL: https://issues.apache.org/jira/browse/HDFS-14349
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, hdfs, qjm
>Reporter: Erik Krogen
>Assignee: Ekanth Sethuramalingam
>Priority: Major
>  Labels: multi-sbnn
>
> When HDFS-14317 was fixed, we tackled the problem that in a cluster with 
> in-progress edit log tailing enabled, a Standby NameNode may _never_ roll the 
> edit logs, which can eventually cause data loss.
> Unfortunately, in the process, it was made so that if there are multiple 
> Standby NameNodes, they will all roll the edit logs at their specified 
> frequency, so the edit log will be rolled X times more frequently than they 
> should be (where X is the number of Standby NNs). This is not as bad as the 
> original bug since rolling frequently does not affect correctness or data 
> availability, but may degrade performance by creating more edit log segments 
> than necessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16022) matlab mapreduce v95 demos can't run hadoop-3.2.2 run time

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16022:

Target Version/s: 3.2.5  (was: 3.2.4)

> matlab mapreduce v95 demos can't run hadoop-3.2.2 run time
> --
>
> Key: HDFS-16022
> URL: https://issues.apache.org/jira/browse/HDFS-16022
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 3.2.2
> Environment: hadoop-3.2.2  + matlab run time+ centos7,  the 
> maxArrivalDelay.ctf file is generated in win10+matlab2018b(V95) by hadoop 
> compiler tools. the airlinesmall.csv upload the HDFS. hadoop can run well by 
> the hadoop-mapreduce-examples-3.2.2.jar wordcount demos, even, jar compiled 
> by the source code in win10+ eclipses env. please help, I have got no idea 
> about this 
>Reporter: cathonxiong
>Priority: Blocker
> Attachments: matlab_errorlog
>
>
>  hadoop \ hadoop \> jar 
> /usr/local/MATLAB/MATLAB_Runtime/v95/toolbox/mlhadoop/jar/a2.2.0/mwmapreduce.jar
>  \> com.mathworks.hadoop.MWMapReduceDriver \> -D 
> mw.mcrroot=/usr/local/MATLAB/MATLAB_Runtime/v95 \> 
> /usr/local/MATLAB/MATLAB_Runtime/v95/maxArrivalDelay.ctf \> 
> hdfs://hadoop.namenode:50070/user/matlab/datasets/airlinesmall.csv \> 
> hdfs://hadoop.namenode:50070/user/matlab/resultsjava.library.path: 
> /usr/local/hadoop-3.2.2/lib/nativeHDFSCTFPath=hdfs://hadoop.namenode:8020/user/root/maxArrivalDelay/maxArrivalDelay.ctfUploading
>  CTF into distributed cache completed.mapred.child.env: 
> MCR_CACHE_ROOT=/tmp,LD_LIBRARY_PATH=/usr/local/MATLAB/MATLAB_Runtime/v95/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/opengl/lib/glnxa64mapred.child.java.opts:
>  
> -Djava.library.path=/usr/local/MATLAB/MATLAB_Runtime/v95/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/opengl/lib/glnxa64New
>  java.library.path: 
> /usr/local/hadoop-3.2.2/lib/native:/usr/local/MATLAB/MATLAB_Runtime/v95/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/opengl/lib/glnxa64Using
>  MATLAB mapper.Set input format class to: ChunkFileRecordReader.Using MATLAB 
> reducer.Set outputformat class to: class 
> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormatSet map output 
> key class to: class com.mathworks.hadoop.MxArrayWritable2Set map output value 
> class to: class com.mathworks.hadoop.MxArrayWritable2Set reduce output key 
> class to: class com.mathworks.hadoop.MxArrayWritable2Set reduce output value 
> class to: class com.mathworks.hadoop.MxArrayWritable2*** run 
> **2021-05-11 14:58:47,043 INFO client.RMProxy: Connecting to 
> ResourceManager at hadoop.namenode/192.168.0.25:80322021-05-11 14:58:47,139 
> WARN net.NetUtils: Unable to wrap exception of type class 
> org.apache.hadoop.ipc.RpcException: it has no (String) 
> constructorjava.lang.NoSuchMethodException: 
> org.apache.hadoop.ipc.RpcException.(java.lang.String) at 
> java.lang.Class.getConstructor0(Class.java:3082) at 
> java.lang.Class.getConstructor(Class.java:1825) at 
> org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:835) at 
> org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:811) at 
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1566) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1508) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1405) at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>  at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:910)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>  at 
> 

[jira] [Commented] (HDFS-16022) matlab mapreduce v95 demos can't run hadoop-3.2.2 run time

2022-07-12 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17565900#comment-17565900
 ] 

Masatake Iwasaki commented on HDFS-16022:
-

update the targets to 3.2.5 for preparing 3.2.4 release.

> matlab mapreduce v95 demos can't run hadoop-3.2.2 run time
> --
>
> Key: HDFS-16022
> URL: https://issues.apache.org/jira/browse/HDFS-16022
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 3.2.2
> Environment: hadoop-3.2.2  + matlab run time+ centos7,  the 
> maxArrivalDelay.ctf file is generated in win10+matlab2018b(V95) by hadoop 
> compiler tools. the airlinesmall.csv upload the HDFS. hadoop can run well by 
> the hadoop-mapreduce-examples-3.2.2.jar wordcount demos, even, jar compiled 
> by the source code in win10+ eclipses env. please help, I have got no idea 
> about this 
>Reporter: cathonxiong
>Priority: Blocker
> Attachments: matlab_errorlog
>
>
>  hadoop \ hadoop \> jar 
> /usr/local/MATLAB/MATLAB_Runtime/v95/toolbox/mlhadoop/jar/a2.2.0/mwmapreduce.jar
>  \> com.mathworks.hadoop.MWMapReduceDriver \> -D 
> mw.mcrroot=/usr/local/MATLAB/MATLAB_Runtime/v95 \> 
> /usr/local/MATLAB/MATLAB_Runtime/v95/maxArrivalDelay.ctf \> 
> hdfs://hadoop.namenode:50070/user/matlab/datasets/airlinesmall.csv \> 
> hdfs://hadoop.namenode:50070/user/matlab/resultsjava.library.path: 
> /usr/local/hadoop-3.2.2/lib/nativeHDFSCTFPath=hdfs://hadoop.namenode:8020/user/root/maxArrivalDelay/maxArrivalDelay.ctfUploading
>  CTF into distributed cache completed.mapred.child.env: 
> MCR_CACHE_ROOT=/tmp,LD_LIBRARY_PATH=/usr/local/MATLAB/MATLAB_Runtime/v95/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/opengl/lib/glnxa64mapred.child.java.opts:
>  
> -Djava.library.path=/usr/local/MATLAB/MATLAB_Runtime/v95/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/opengl/lib/glnxa64New
>  java.library.path: 
> /usr/local/hadoop-3.2.2/lib/native:/usr/local/MATLAB/MATLAB_Runtime/v95/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/opengl/lib/glnxa64Using
>  MATLAB mapper.Set input format class to: ChunkFileRecordReader.Using MATLAB 
> reducer.Set outputformat class to: class 
> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormatSet map output 
> key class to: class com.mathworks.hadoop.MxArrayWritable2Set map output value 
> class to: class com.mathworks.hadoop.MxArrayWritable2Set reduce output key 
> class to: class com.mathworks.hadoop.MxArrayWritable2Set reduce output value 
> class to: class com.mathworks.hadoop.MxArrayWritable2*** run 
> **2021-05-11 14:58:47,043 INFO client.RMProxy: Connecting to 
> ResourceManager at hadoop.namenode/192.168.0.25:80322021-05-11 14:58:47,139 
> WARN net.NetUtils: Unable to wrap exception of type class 
> org.apache.hadoop.ipc.RpcException: it has no (String) 
> constructorjava.lang.NoSuchMethodException: 
> org.apache.hadoop.ipc.RpcException.(java.lang.String) at 
> java.lang.Class.getConstructor0(Class.java:3082) at 
> java.lang.Class.getConstructor(Class.java:1825) at 
> org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:835) at 
> org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:811) at 
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1566) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1508) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1405) at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>  at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:910)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>  at 
> 

[jira] [Updated] (HDFS-16022) matlab mapreduce v95 demos can't run hadoop-3.2.2 run time

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16022:

Priority: Major  (was: Blocker)

> matlab mapreduce v95 demos can't run hadoop-3.2.2 run time
> --
>
> Key: HDFS-16022
> URL: https://issues.apache.org/jira/browse/HDFS-16022
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 3.2.2
> Environment: hadoop-3.2.2  + matlab run time+ centos7,  the 
> maxArrivalDelay.ctf file is generated in win10+matlab2018b(V95) by hadoop 
> compiler tools. the airlinesmall.csv upload the HDFS. hadoop can run well by 
> the hadoop-mapreduce-examples-3.2.2.jar wordcount demos, even, jar compiled 
> by the source code in win10+ eclipses env. please help, I have got no idea 
> about this 
>Reporter: cathonxiong
>Priority: Major
> Attachments: matlab_errorlog
>
>
>  hadoop \ hadoop \> jar 
> /usr/local/MATLAB/MATLAB_Runtime/v95/toolbox/mlhadoop/jar/a2.2.0/mwmapreduce.jar
>  \> com.mathworks.hadoop.MWMapReduceDriver \> -D 
> mw.mcrroot=/usr/local/MATLAB/MATLAB_Runtime/v95 \> 
> /usr/local/MATLAB/MATLAB_Runtime/v95/maxArrivalDelay.ctf \> 
> hdfs://hadoop.namenode:50070/user/matlab/datasets/airlinesmall.csv \> 
> hdfs://hadoop.namenode:50070/user/matlab/resultsjava.library.path: 
> /usr/local/hadoop-3.2.2/lib/nativeHDFSCTFPath=hdfs://hadoop.namenode:8020/user/root/maxArrivalDelay/maxArrivalDelay.ctfUploading
>  CTF into distributed cache completed.mapred.child.env: 
> MCR_CACHE_ROOT=/tmp,LD_LIBRARY_PATH=/usr/local/MATLAB/MATLAB_Runtime/v95/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/opengl/lib/glnxa64mapred.child.java.opts:
>  
> -Djava.library.path=/usr/local/MATLAB/MATLAB_Runtime/v95/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/opengl/lib/glnxa64New
>  java.library.path: 
> /usr/local/hadoop-3.2.2/lib/native:/usr/local/MATLAB/MATLAB_Runtime/v95/runtime/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/bin/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/os/glnxa64:/usr/local/MATLAB/MATLAB_Runtime/v95/sys/opengl/lib/glnxa64Using
>  MATLAB mapper.Set input format class to: ChunkFileRecordReader.Using MATLAB 
> reducer.Set outputformat class to: class 
> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormatSet map output 
> key class to: class com.mathworks.hadoop.MxArrayWritable2Set map output value 
> class to: class com.mathworks.hadoop.MxArrayWritable2Set reduce output key 
> class to: class com.mathworks.hadoop.MxArrayWritable2Set reduce output value 
> class to: class com.mathworks.hadoop.MxArrayWritable2*** run 
> **2021-05-11 14:58:47,043 INFO client.RMProxy: Connecting to 
> ResourceManager at hadoop.namenode/192.168.0.25:80322021-05-11 14:58:47,139 
> WARN net.NetUtils: Unable to wrap exception of type class 
> org.apache.hadoop.ipc.RpcException: it has no (String) 
> constructorjava.lang.NoSuchMethodException: 
> org.apache.hadoop.ipc.RpcException.(java.lang.String) at 
> java.lang.Class.getConstructor0(Class.java:3082) at 
> java.lang.Class.getConstructor(Class.java:1825) at 
> org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:835) at 
> org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:811) at 
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1566) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1508) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1405) at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>  at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:910)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>  at 
> 

[jira] [Work logged] (HDFS-16655) OIV: print out erasure coding policy name in oiv Delimited output

2022-07-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16655?focusedWorklogId=790003=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-790003
 ]

ASF GitHub Bot logged work on HDFS-16655:
-

Author: ASF GitHub Bot
Created on: 12/Jul/22 11:41
Start Date: 12/Jul/22 11:41
Worklog Time Spent: 10m 
  Work Description: Neilxzn commented on PR #4541:
URL: https://github.com/apache/hadoop/pull/4541#issuecomment-1181655505

   ping @Hexiaoqiao .  Would you have a time to take a look about this patch? 
Thank you!
   
   




Issue Time Tracking
---

Worklog Id: (was: 790003)
Time Spent: 50m  (was: 40m)

> OIV: print out erasure coding policy name in oiv Delimited output
> -
>
> Key: HDFS-16655
> URL: https://issues.apache.org/jira/browse/HDFS-16655
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> By adding erasure coding policy name to oiv output, it will help with oiv 
> post-analysis to have a overview of all folders/files with specified ec 
> policy and to apply internal regulation based on this information. In 
> particular, it wiil be convenient for the platform to calculate the real 
> storage size of the ec file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16177) Bug fix for Util#receiveFile

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16177:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> Bug fix for Util#receiveFile
> 
>
> Key: HDFS-16177
> URL: https://issues.apache.org/jira/browse/HDFS-16177
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: download-fsimage.jpg
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The time to write file was miscalculated in Util#receiveFile.
> !download-fsimage.jpg|width=578,height=134!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16198) Short circuit read leaks Slot objects when InvalidToken exception is thrown

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16198:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> Short circuit read leaks Slot objects when InvalidToken exception is thrown
> ---
>
> Key: HDFS-16198
> URL: https://issues.apache.org/jira/browse/HDFS-16198
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eungsop Yoo
>Assignee: Eungsop Yoo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 2.10.2, 3.3.2, 3.2.4
>
> Attachments: HDFS-16198.patch, screenshot-2.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In secure mode, 'dfs.block.access.token.enable' should be set 'true'. With 
> this configuration SecretManager.InvalidToken exception may be thrown if the 
> access token expires when we do short circuit reads. It doesn't matter 
> because the failed reads will be retried. But it causes the leakage of 
> ShortCircuitShm.Slot objects. 
>  
> We found this problem in our secure HBase clusters. The number of open file 
> descriptors of RegionServers kept increasing using short circuit reading. 
> !screenshot-2.png!
>  
> It was caused by the leakage of shared memory segments used by short circuit 
> reading.
> {code:java}
> [root ~]# lsof -p $(ps -ef | grep proc_regionserver | grep -v grep | awk 
> '{print $2}') | grep /dev/shm | wc -l
> 3925
> [root ~]# lsof -p $(ps -ef | grep proc_regionserver | grep -v grep | awk 
> '{print $2}') | grep /dev/shm | head -5
> java 86309 hbase DEL REG 0,19 2308279984 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_743473959
> java 86309 hbase DEL REG 0,19 2306359893 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_1594162967
> java 86309 hbase DEL REG 0,19 2305496758 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_2043027439
> java 86309 hbase DEL REG 0,19 2304784261 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_689571088
> java 86309 hbase DEL REG 0,19 2302621988 
> /dev/shm/HadoopShortCircuitShm_DFSClient_NONMAPREDUCE_-1107866286_1_347008590 
> {code}
>  
> We finally found that the root cause of this is the leakage of 
> ShortCircuitShm.Slot.
>  
> The fix is trivial. Just free the slot when InvalidToken exception is thrown.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16241) Standby close reconstruction thread

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16241:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> Standby close reconstruction thread
> ---
>
> Key: HDFS-16241
> URL: https://issues.apache.org/jira/browse/HDFS-16241
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: zhanghuazong
>Assignee: zhanghuazong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: HDFS-16241
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When the "Reconstruction Queue Initializer" thread of the active namenode has 
> not stopped, switch to standby namenode. The "Reconstruction Queue 
> Initializer" thread should be closed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16187) SnapshotDiff behaviour with Xattrs and Acls is not consistent across NN restarts with checkpointing

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16187:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> SnapshotDiff behaviour with Xattrs and Acls is not consistent across NN 
> restarts with checkpointing
> ---
>
> Key: HDFS-16187
> URL: https://issues.apache.org/jira/browse/HDFS-16187
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Srinivasu Majeti
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> The below test shows the snapshot diff between across snapshots is not 
> consistent with Xattr(EZ here settinh the Xattr) across NN restarts with 
> checkpointed FsImage.
> {code:java}
> @Test
> public void testEncryptionZonesWithSnapshots() throws Exception {
>   final Path snapshottable = new Path("/zones");
>   fsWrapper.mkdir(snapshottable, FsPermission.getDirDefault(),
>   true);
>   dfsAdmin.allowSnapshot(snapshottable);
>   dfsAdmin.createEncryptionZone(snapshottable, TEST_KEY, NO_TRASH);
>   fs.createSnapshot(snapshottable, "snap1");
>   SnapshotDiffReport report =
>   fs.getSnapshotDiffReport(snapshottable, "snap1", "");
>   Assert.assertEquals(0, report.getDiffList().size());
>   report =
>   fs.getSnapshotDiffReport(snapshottable, "snap1", "");
>   System.out.println(report);
>   Assert.assertEquals(0, report.getDiffList().size());
>   fs.setSafeMode(SafeModeAction.SAFEMODE_ENTER);
>   fs.saveNamespace();
>   fs.setSafeMode(SafeModeAction.SAFEMODE_LEAVE);
>   cluster.restartNameNode(true);
>   report =
>   fs.getSnapshotDiffReport(snapshottable, "snap1", "");
>   Assert.assertEquals(0, report.getDiffList().size());
> }{code}
> {code:java}
> Pre Restart:
> Difference between snapshot snap1 and current directory under directory 
> /zones:
> Post Restart:
> Difference between snapshot snap1 and current directory under directory 
> /zones:
> M .{code}
> The side effect of this behavior is : distcp with snapshot diff would fail 
> with below error complaining that target cluster has some data changed .
> {code:java}
> WARN tools.DistCp: The target has been modified since snapshot x
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16182) numOfReplicas is given the wrong value in BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with Heterogeneous Storage

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16182:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> numOfReplicas is given the wrong value in  
> BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with 
> Heterogeneous Storage  
> ---
>
> Key: HDFS-16182
> URL: https://issues.apache.org/jira/browse/HDFS-16182
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: HDFS-16182.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> In our hdfs cluster, we use heterogeneous storage to store data in SSD  for a 
> better performance. Sometimes  hdfs client transfer data in pipline,  it will 
> throw IOException and exit.  Exception logs are below: 
> ```
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK],
>  
> DatanodeInfoWithStorage[dn03_ip:5004,DS-a388c067-76a4-4014-a16c-ccc49c8da77b,SSD],
>  
> DatanodeInfoWithStorage[dn04_ip:5004,DS-b81da262-0dd9-4567-a498-c516fab84fe0,SSD],
>  
> DatanodeInfoWithStorage[dn05_ip:5004,DS-34e3af2e-da80-46ac-938c-6a3218a646b9,SSD]],
>  
> original=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK],
>  
> DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
> ```
> After search it,   I found when existing pipline need replace new dn to 
> transfer data, the client will get one additional dn from namenode  and check 
> that the number of dn is the original number + 1.
> ```
> ## DataStreamer$findNewDatanode
> if (nodes.length != original.length + 1) {
>  throw new IOException(
>  "Failed to replace a bad datanode on the existing pipeline "
>  + "due to no more good datanodes being available to try. "
>  + "(Nodes: current=" + Arrays.asList(nodes)
>  + ", original=" + Arrays.asList(original) + "). "
>  + "The current failed datanode replacement policy is "
>  + dfsClient.dtpReplaceDatanodeOnFailure
>  + ", and a client may configure this via '"
>  + BlockWrite.ReplaceDatanodeOnFailure.POLICY_KEY
>  + "' in its configuration.");
> }
> ```
> The root cause is that Namenode$getAdditionalDatanode returns multi datanodes 
> , not one in DataStreamer.addDatanode2ExistingPipeline. 
>  
> Maybe we can fix it in BlockPlacementPolicyDefault$chooseTarget.  I think 
> numOfReplicas should not be assigned by requiredStorageTypes.
>  
>    
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16350) Datanode start time should be set after RPC server starts successfully

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16350:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> Datanode start time should be set after RPC server starts successfully
> --
>
> Key: HDFS-16350
> URL: https://issues.apache.org/jira/browse/HDFS-16350
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: Screenshot 2021-11-23 at 4.32.04 PM.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> We set start time of Datanode when the class is instantiated but it should be 
> ideally set only after RPC server starts and RPC handlers are initialized to 
> serve client requests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16337) Show start time of Datanode on Web

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16337:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> Show start time of Datanode on Web
> --
>
> Key: HDFS-16337
> URL: https://issues.apache.org/jira/browse/HDFS-16337
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
> Attachments: image-2021-11-19-08-55-58-343.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Show _start time_ of Datanode on Web.
> !image-2021-11-19-08-55-58-343.png|width=540,height=155!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16352) return the real datanode numBlocks in #getDatanodeStorageReport

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16352:

Fix Version/s: 3.2.4

> return the real datanode numBlocks in #getDatanodeStorageReport
> ---
>
> Key: HDFS-16352
> URL: https://issues.apache.org/jira/browse/HDFS-16352
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.9
>
> Attachments: image-2021-11-23-22-04-06-131.png
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> #getDatanodeStorageReport will return the array of DatanodeStorageReport 
> which contains the DatanodeInfo in each DatanodeStorageReport, but the 
> numBlocks in DatanodeInfo is always zero, which is confusing
> !image-2021-11-23-22-04-06-131.png|width=683,height=338!
> Or we can return the real numBlocks in DatanodeInfo when we call 
> #getDatanodeStorageReport



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16430) Validate maximum blocks in EC group when adding an EC policy

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16430:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> Validate maximum blocks in EC group when adding an EC policy
> 
>
> Key: HDFS-16430
> URL: https://issues.apache.org/jira/browse/HDFS-16430
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ec, erasure-coding
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.9
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HDFS EC adopts the last 4 bits of block ID to store the block index in EC 
> block group. Therefore maximum blocks in EC block group is 2^4=16, and which 
> is defined here: HdfsServerConstants#MAX_BLOCKS_IN_GROUP.
> Currently there is no limitation or warning when adding a bad EC policy with 
> numDataUnits + numParityUnits > 16. It only results in read/write error on EC 
> file with bad EC policy. To users this is not very straightforward.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16403) Improve FUSE IO performance by supporting FUSE parameter max_background

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16403:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> Improve FUSE IO performance by supporting FUSE parameter max_background
> ---
>
> Key: HDFS-16403
> URL: https://issues.apache.org/jira/browse/HDFS-16403
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fuse-dfs
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.9
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> When we examining the FUSE IO performance on HDFS, we found that the 
> simultaneous IO requests number are limited to a fixed number, like 12. This 
> limitation makes the IO performance on FUSE client quite unacceptable. We did 
> some research on this and inspired by the article  [Performance and Resource 
> Utilization of FUSE User-Space File 
> Systems|https://dl.acm.org/doi/fullHtml/10.1145/3310148], clearly the FUSE 
> parameter '{{{}max_background{}}}' decides the simultaneous IO requests 
> number, which is 12 by default.
> We add 'max_background' to fuse_dfs mount options,  the FUSE kernel will take 
> effect when an option value is given.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16437) ReverseXML processor doesn't accept XML files without the SnapshotDiffSection.

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16437:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> ReverseXML processor doesn't accept XML files without the SnapshotDiffSection.
> --
>
> Key: HDFS-16437
> URL: https://issues.apache.org/jira/browse/HDFS-16437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1, 3.3.0
>Reporter: yanbin.zhang
>Assignee: yanbin.zhang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> In a cluster environment without snapshot, if you want to convert back to 
> fsimage through the generated xml, an error will be reported.
> {code:java}
> //代码占位符
> [test@test001 ~]$ hdfs oiv -p ReverseXML -i fsimage_0257220.xml 
> -o fsimage_0257220
> OfflineImageReconstructor failed: FSImage XML ended prematurely, without 
> including section(s) SnapshotDiffSection
> java.io.IOException: FSImage XML ended prematurely, without including 
> section(s) SnapshotDiffSection
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.processXml(OfflineImageReconstructor.java:1765)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.run(OfflineImageReconstructor.java:1842)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.run(OfflineImageViewerPB.java:211)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.main(OfflineImageViewerPB.java:149)
> 22/01/25 15:56:52 INFO util.ExitUtil: Exiting with status 1: ExitException 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11041) Unable to unregister FsDatasetState MBean if DataNode is shutdown twice

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-11041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-11041:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> Unable to unregister FsDatasetState MBean if DataNode is shutdown twice
> ---
>
> Key: HDFS-11041
> URL: https://issues.apache.org/jira/browse/HDFS-11041
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Trivial
> Fix For: 3.4.0, 2.10.2, 3.2.4, 3.3.3
>
> Attachments: HDFS-11041.01.patch, HDFS-11041.02.patch, 
> HDFS-11041.03.patch
>
>
> I saw error message like the following in some tests
> {noformat}
> 2016-10-21 04:09:03,900 [main] WARN  util.MBeans 
> (MBeans.java:unregister(114)) - Error unregistering 
> Hadoop:service=DataNode,name=FSDatasetState-33cd714c-0b1a-471f-8efe-f431d7d874bc
> javax.management.InstanceNotFoundException: 
> Hadoop:service=DataNode,name=FSDatasetState-33cd714c-0b1a-471f-8efe-f431d7d874bc
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)
>   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546)
>   at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:112)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:2127)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:2016)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1985)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1962)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1936)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1929)
>   at 
> org.apache.hadoop.hdfs.TestDatanodeReport.testDatanodeReport(TestDatanodeReport.java:144)
> {noformat}
> The test shuts down datanode, and then shutdown cluster, which shuts down the 
> a datanode twice. Resetting the FsDatasetSpi reference in DataNode to null 
> resolves the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16428) Source path with storagePolicy cause wrong typeConsumed while rename

2022-07-12 Thread Masatake Iwasaki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-16428:

Fix Version/s: 3.2.4
   (was: 3.2.3)

> Source path with storagePolicy cause wrong typeConsumed while rename
> 
>
> Key: HDFS-16428
> URL: https://issues.apache.org/jira/browse/HDFS-16428
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: lei w
>Assignee: lei w
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
> Attachments: example.txt
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> When compute quota in rename operation , we use storage policy of the target 
> directory to compute src  quota usage. This will cause wrong value of 
> typeConsumed when source path was setted storage policy. I provided a unit 
> test to present this situation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org