date:20220112

[jira] [Work logged] (HDFS-16043) Add markedDeleteBlockScrubberThread to delete blocks asynchronously

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16043?focusedWorklogId=708096=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-708096
 ]

ASF GitHub Bot logged work on HDFS-16043:
-

Author: ASF GitHub Bot
Created on: 13/Jan/22 07:20
Start Date: 13/Jan/22 07:20
Worklog Time Spent: 10m 
  Work Description: zhuxiangyi commented on pull request #3063:
URL: https://github.com/apache/hadoop/pull/3063#issuecomment-1011863747


   ok, I will submit it later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 708096)
Time Spent: 9h 40m  (was: 9.5h)

> Add markedDeleteBlockScrubberThread to delete blocks asynchronously
> ---
>
> Key: HDFS-16043
> URL: https://issues.apache.org/jira/browse/HDFS-16043
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namanode
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: 20210527-after.svg, 20210527-before.svg
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> The deletion of the large directory caused NN to hold the lock for too long, 
> which caused our NameNode to be killed by ZKFC.
>  Through the flame graph, it is found that its main time-consuming 
> calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting 
> inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time.
> h3. solution:
> 1. RemoveBlocks is processed asynchronously. A thread is started in the 
> BlockManager to process the deleted blocks and control the lock time.
>  2. QuotaCount calculation optimization, this is similar to the optimization 
> of this Issue HDFS-16000.
> h3. Comparison before and after optimization:
> Delete 1000w Inode and 1000w block test.
>  *before:*
> remove inode elapsed time: 7691 ms
>  remove block elapsed time :11107 ms
>  *after:*
>  remove inode elapsed time: 4149 ms
>  remove block elapsed time :0 ms



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16411) RBF: RouterId is NULL when set dfs.federation.router.rpc.enable=false

2022-01-12 Thread YulongZ (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YulongZ updated HDFS-16411:
---
Attachment: HDFS-16411.004.patch

> RBF: RouterId is NULL when set dfs.federation.router.rpc.enable=false
> -
>
> Key: HDFS-16411
> URL: https://issues.apache.org/jira/browse/HDFS-16411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: YulongZ
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-16411.000.patch, HDFS-16411.001.patch, 
> HDFS-16411.002.patch, HDFS-16411.003.patch, HDFS-16411.004.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When dfs.federation.router.rpc.enable=false, routerid is null, but 
> RouterHeartbeatService need updateStateStore() with routerId.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16043) Add markedDeleteBlockScrubberThread to delete blocks asynchronously

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16043?focusedWorklogId=708093=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-708093
 ]

ASF GitHub Bot logged work on HDFS-16043:
-

Author: ASF GitHub Bot
Created on: 13/Jan/22 07:12
Start Date: 13/Jan/22 07:12
Worklog Time Spent: 10m 
  Work Description: Hexiaoqiao commented on pull request #3063:
URL: https://github.com/apache/hadoop/pull/3063#issuecomment-1011858444


   @zhuxiangyi It is conflict when cherry-pick to other active branches. Do you 
mind to create another PR to branch-3.3?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 708093)
Time Spent: 9.5h  (was: 9h 20m)

> Add markedDeleteBlockScrubberThread to delete blocks asynchronously
> ---
>
> Key: HDFS-16043
> URL: https://issues.apache.org/jira/browse/HDFS-16043
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namanode
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: 20210527-after.svg, 20210527-before.svg
>
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> The deletion of the large directory caused NN to hold the lock for too long, 
> which caused our NameNode to be killed by ZKFC.
>  Through the flame graph, it is found that its main time-consuming 
> calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting 
> inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time.
> h3. solution:
> 1. RemoveBlocks is processed asynchronously. A thread is started in the 
> BlockManager to process the deleted blocks and control the lock time.
>  2. QuotaCount calculation optimization, this is similar to the optimization 
> of this Issue HDFS-16000.
> h3. Comparison before and after optimization:
> Delete 1000w Inode and 1000w block test.
>  *before:*
> remove inode elapsed time: 7691 ms
>  remove block elapsed time :11107 ms
>  *after:*
>  remove inode elapsed time: 4149 ms
>  remove block elapsed time :0 ms



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16411) RBF: RouterId is NULL when set dfs.federation.router.rpc.enable=false

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16411?focusedWorklogId=708091=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-708091
 ]

ASF GitHub Bot logged work on HDFS-16411:
-

Author: ASF GitHub Bot
Created on: 13/Jan/22 06:58
Start Date: 13/Jan/22 06:58
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3878:
URL: https://github.com/apache/hadoop/pull/3878#issuecomment-1011849467


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 55s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  35m 41s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   0m 24s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 40s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 52s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 18s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 38s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 15s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 31s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 49s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 23s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  39m 28s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3878/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 34s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 135m  9s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.rbfbalance.TestRouterDistCpProcedure |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3878/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3878 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 16703bbea967 4.15.0-142-generic #146-Ubuntu SMP Tue Apr 13 
01:11:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / e208d71c30e8f1cdc50ba7953bb7f05637a8d2fe |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3878/4/testReport/ |
   | Max. process+thread count | 2252 (vs. ulimit of 5500) |
   | modules | C:

[jira] [Work logged] (HDFS-16422) Fix thread safety of EC decoding during concurrent preads

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16422?focusedWorklogId=708089=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-708089
 ]

ASF GitHub Bot logged work on HDFS-16422:
-

Author: ASF GitHub Bot
Created on: 13/Jan/22 06:38
Start Date: 13/Jan/22 06:38
Worklog Time Spent: 10m 
  Work Description: cndaimin opened a new pull request #3881:
URL: https://github.com/apache/hadoop/pull/3881


   Reading data on an erasure-coded file with missing replicas(internal block 
of block group) will cause online reconstruction: read `dataUnits` part of data 
and decode them into the target missing data. Each `DFSStripedInputStream` 
object has a `RawErasureDecoder` object, and when we doing pread concurrently, 
`RawErasureDecoder.decode` will be invoked concurrently too. 
`RawErasureDecoder.decode` is not thread safe, as a result of that we get wrong 
data from pread occasionally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 708089)
Remaining Estimate: 0h
Time Spent: 10m

> Fix thread safety of EC decoding during concurrent preads
> -
>
> Key: HDFS-16422
> URL: https://issues.apache.org/jira/browse/HDFS-16422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient, ec, erasure-coding
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Reading data on an erasure-coded file with missing replicas(internal block of 
> block group) will cause online reconstruction: read dataUnits part of data 
> and decode them into the target missing data. Each DFSStripedInputStream 
> object has a RawErasureDecoder object, and when we doing pread concurrently, 
> RawErasureDecoder.decode will be invoked concurrently too. 
> RawErasureDecoder.decode is not thread safe, as a result of that we get wrong 
> data from pread occasionally.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16422) Fix thread safety of EC decoding during concurrent preads

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16422:
--
Labels: pull-request-available  (was: )

> Fix thread safety of EC decoding during concurrent preads
> -
>
> Key: HDFS-16422
> URL: https://issues.apache.org/jira/browse/HDFS-16422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient, ec, erasure-coding
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Reading data on an erasure-coded file with missing replicas(internal block of 
> block group) will cause online reconstruction: read dataUnits part of data 
> and decode them into the target missing data. Each DFSStripedInputStream 
> object has a RawErasureDecoder object, and when we doing pread concurrently, 
> RawErasureDecoder.decode will be invoked concurrently too. 
> RawErasureDecoder.decode is not thread safe, as a result of that we get wrong 
> data from pread occasionally.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-16422) Fix thread safety of EC decoding during concurrent preads

2022-01-12 Thread daimin (Jira)

daimin created HDFS-16422:
-

 Summary: Fix thread safety of EC decoding during concurrent preads
 Key: HDFS-16422
 URL: https://issues.apache.org/jira/browse/HDFS-16422
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: dfsclient, ec, erasure-coding
Affects Versions: 3.3.1, 3.3.0
Reporter: daimin
Assignee: daimin


Reading data on an erasure-coded file with missing replicas(internal block of 
block group) will cause online reconstruction: read dataUnits part of data and 
decode them into the target missing data. Each DFSStripedInputStream object has 
a RawErasureDecoder object, and when we doing pread concurrently, 
RawErasureDecoder.decode will be invoked concurrently too. 
RawErasureDecoder.decode is not thread safe, as a result of that we get wrong 
data from pread occasionally.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16043) Add markedDeleteBlockScrubberThread to delete blocks asynchronously

2022-01-12 Thread Xiaoqiao He (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He resolved HDFS-16043.

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

Committed to trunk. Will cherry-pick to other active branches if no explicit 
conflict.

> Add markedDeleteBlockScrubberThread to delete blocks asynchronously
> ---
>
> Key: HDFS-16043
> URL: https://issues.apache.org/jira/browse/HDFS-16043
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namanode
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: 20210527-after.svg, 20210527-before.svg
>
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> The deletion of the large directory caused NN to hold the lock for too long, 
> which caused our NameNode to be killed by ZKFC.
>  Through the flame graph, it is found that its main time-consuming 
> calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting 
> inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time.
> h3. solution:
> 1. RemoveBlocks is processed asynchronously. A thread is started in the 
> BlockManager to process the deleted blocks and control the lock time.
>  2. QuotaCount calculation optimization, this is similar to the optimization 
> of this Issue HDFS-16000.
> h3. Comparison before and after optimization:
> Delete 1000w Inode and 1000w block test.
>  *before:*
> remove inode elapsed time: 7691 ms
>  remove block elapsed time :11107 ms
>  *after:*
>  remove inode elapsed time: 4149 ms
>  remove block elapsed time :0 ms



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16043) Add markedDeleteBlockScrubberThread to delete blocks asynchronously

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16043?focusedWorklogId=708074=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-708074
 ]

ASF GitHub Bot logged work on HDFS-16043:
-

Author: ASF GitHub Bot
Created on: 13/Jan/22 04:58
Start Date: 13/Jan/22 04:58
Worklog Time Spent: 10m 
  Work Description: Hexiaoqiao commented on pull request #3063:
URL: https://github.com/apache/hadoop/pull/3063#issuecomment-1011797007


   Commit to trunk based on build result 
(https://github.com/apache/hadoop/pull/3063#issuecomment-1011295026) and 
checkstyle fix 
only(https://github.com/apache/hadoop/pull/3063/commits/f6f2793310eff7c0678d027c912c95dcc3482972).
   
   Thanks @zhuxiangyi for your contribution. Thanks @jojochuang and @tomscut 
for you reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 708074)
Time Spent: 9h 20m  (was: 9h 10m)

> Add markedDeleteBlockScrubberThread to delete blocks asynchronously
> ---
>
> Key: HDFS-16043
> URL: https://issues.apache.org/jira/browse/HDFS-16043
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namanode
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210527-after.svg, 20210527-before.svg
>
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> The deletion of the large directory caused NN to hold the lock for too long, 
> which caused our NameNode to be killed by ZKFC.
>  Through the flame graph, it is found that its main time-consuming 
> calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting 
> inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time.
> h3. solution:
> 1. RemoveBlocks is processed asynchronously. A thread is started in the 
> BlockManager to process the deleted blocks and control the lock time.
>  2. QuotaCount calculation optimization, this is similar to the optimization 
> of this Issue HDFS-16000.
> h3. Comparison before and after optimization:
> Delete 1000w Inode and 1000w block test.
>  *before:*
> remove inode elapsed time: 7691 ms
>  remove block elapsed time :11107 ms
>  *after:*
>  remove inode elapsed time: 4149 ms
>  remove block elapsed time :0 ms



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16043) Add markedDeleteBlockScrubberThread to delete blocks asynchronously

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16043?focusedWorklogId=708073=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-708073
 ]

ASF GitHub Bot logged work on HDFS-16043:
-

Author: ASF GitHub Bot
Created on: 13/Jan/22 04:56
Start Date: 13/Jan/22 04:56
Worklog Time Spent: 10m 
  Work Description: Hexiaoqiao merged pull request #3063:
URL: https://github.com/apache/hadoop/pull/3063


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 708073)
Time Spent: 9h 10m  (was: 9h)

> Add markedDeleteBlockScrubberThread to delete blocks asynchronously
> ---
>
> Key: HDFS-16043
> URL: https://issues.apache.org/jira/browse/HDFS-16043
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namanode
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210527-after.svg, 20210527-before.svg
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> The deletion of the large directory caused NN to hold the lock for too long, 
> which caused our NameNode to be killed by ZKFC.
>  Through the flame graph, it is found that its main time-consuming 
> calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting 
> inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time.
> h3. solution:
> 1. RemoveBlocks is processed asynchronously. A thread is started in the 
> BlockManager to process the deleted blocks and control the lock time.
>  2. QuotaCount calculation optimization, this is similar to the optimization 
> of this Issue HDFS-16000.
> h3. Comparison before and after optimization:
> Delete 1000w Inode and 1000w block test.
>  *before:*
> remove inode elapsed time: 7691 ms
>  remove block elapsed time :11107 ms
>  *after:*
>  remove inode elapsed time: 4149 ms
>  remove block elapsed time :0 ms



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16421) Remove RouterRpcFairnessPolicyController ConcurrentNS to avoid renewLease being unavailable

2022-01-12 Thread Xiangyi Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangyi Zhu updated HDFS-16421:
---
Summary: Remove RouterRpcFairnessPolicyController ConcurrentNS to avoid 
renewLease being unavailable  (was: RouterRpcFairnessPolicyController remove 
ConcurrentNS )

> Remove RouterRpcFairnessPolicyController ConcurrentNS to avoid renewLease 
> being unavailable
> ---
>
> Key: HDFS-16421
> URL: https://issues.apache.org/jira/browse/HDFS-16421
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>
> When using the RouterRpcFairnessConstants strategy, if the NamNode rpc is 
> slow or does not respond, it is easy to use up the concurrent available 
> handlers, and the client will not be able to renewLease normally.
> I think CONCURRENT_NS can be removed. When there is an rpc of CONCURRENT, we 
> traverse each NS to apply for the corresponding Handler, instead of just 
> applying for one Handler like CONCURRENT.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16421) RouterRpcFairnessPolicyController remove ConcurrentNS

2022-01-12 Thread Xiangyi Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangyi Zhu updated HDFS-16421:
---
Summary: RouterRpcFairnessPolicyController remove ConcurrentNS   (was: 
RouterRpcFairnessConstants remove ConcurrentNS )

> RouterRpcFairnessPolicyController remove ConcurrentNS 
> --
>
> Key: HDFS-16421
> URL: https://issues.apache.org/jira/browse/HDFS-16421
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>
> When using the RouterRpcFairnessConstants strategy, if the NamNode rpc is 
> slow or does not respond, it is easy to use up the concurrent available 
> handlers, and the client will not be able to renewLease normally.
> I think CONCURRENT_NS can be removed. When there is an rpc of CONCURRENT, we 
> traverse each NS to apply for the corresponding Handler, instead of just 
> applying for one Handler like CONCURRENT.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-16421) RouterRpcFairnessConstants remove ConcurrentNS

2022-01-12 Thread Xiangyi Zhu (Jira)

Xiangyi Zhu created HDFS-16421:
--

 Summary: RouterRpcFairnessConstants remove ConcurrentNS 
 Key: HDFS-16421
 URL: https://issues.apache.org/jira/browse/HDFS-16421
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: rbf
Affects Versions: 3.4.0
Reporter: Xiangyi Zhu
Assignee: Xiangyi Zhu


When using the RouterRpcFairnessConstants strategy, if the NamNode rpc is slow 
or does not respond, it is easy to use up the concurrent available handlers, 
and the client will not be able to renewLease normally.

I think CONCURRENT_NS can be removed. When there is an rpc of CONCURRENT, we 
traverse each NS to apply for the corresponding Handler, instead of just 
applying for one Handler like CONCURRENT.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16043) Add markedDeleteBlockScrubberThread to delete blocks asynchronously

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16043?focusedWorklogId=708072=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-708072
 ]

ASF GitHub Bot logged work on HDFS-16043:
-

Author: ASF GitHub Bot
Created on: 13/Jan/22 04:35
Start Date: 13/Jan/22 04:35
Worklog Time Spent: 10m 
  Work Description: zhuxiangyi commented on pull request #3063:
URL: https://github.com/apache/hadoop/pull/3063#issuecomment-1011789268


   @Hexiaoqiao resubmitted


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 708072)
Time Spent: 9h  (was: 8h 50m)

> Add markedDeleteBlockScrubberThread to delete blocks asynchronously
> ---
>
> Key: HDFS-16043
> URL: https://issues.apache.org/jira/browse/HDFS-16043
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namanode
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210527-after.svg, 20210527-before.svg
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> The deletion of the large directory caused NN to hold the lock for too long, 
> which caused our NameNode to be killed by ZKFC.
>  Through the flame graph, it is found that its main time-consuming 
> calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting 
> inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time.
> h3. solution:
> 1. RemoveBlocks is processed asynchronously. A thread is started in the 
> BlockManager to process the deleted blocks and control the lock time.
>  2. QuotaCount calculation optimization, this is similar to the optimization 
> of this Issue HDFS-16000.
> h3. Comparison before and after optimization:
> Delete 1000w Inode and 1000w block test.
>  *before:*
> remove inode elapsed time: 7691 ms
>  remove block elapsed time :11107 ms
>  *after:*
>  remove inode elapsed time: 4149 ms
>  remove block elapsed time :0 ms



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16411) RBF: RouterId is NULL when set dfs.federation.router.rpc.enable=false

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16411?focusedWorklogId=708059=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-708059
 ]

ASF GitHub Bot logged work on HDFS-16411:
-

Author: ASF GitHub Bot
Created on: 13/Jan/22 04:17
Start Date: 13/Jan/22 04:17
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3878:
URL: https://github.com/apache/hadoop/pull/3878#issuecomment-1011781410


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 14s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  38m 52s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 49s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   0m 27s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 45s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 45s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 40s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  26m 47s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 43s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   0m 43s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 20s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3878/3/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 9 new + 0 
unchanged - 0 fixed = 9 total (was 0)  |
   | +1 :green_heart: |  mvnsite  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 52s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 37s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m 31s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  41m 44s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3878/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 31s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 147m 40s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.rbfbalance.TestRouterDistCpProcedure |
   |   | hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3878/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3878 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 6995f491909d 4.15.0-142-generic #146-Ubuntu SMP Tue Apr 13 
01:11:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 832bc38bcf720da049c6d0eaa12c2e695a5686f7 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions

[jira] [Work logged] (HDFS-16420) ec + balancer may cause missing block

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16420?focusedWorklogId=708048=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-708048
 ]

ASF GitHub Bot logged work on HDFS-16420:
-

Author: ASF GitHub Bot
Created on: 13/Jan/22 03:33
Start Date: 13/Jan/22 03:33
Worklog Time Spent: 10m 
  Work Description: Jackson-Wang-7 commented on pull request #3880:
URL: https://github.com/apache/hadoop/pull/3880#issuecomment-1011754937


   @Hexiaoqiao @ayushtkn Please take a look at this PR. We have started testing 
in this way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 708048)
Time Spent: 20m  (was: 10m)

> ec + balancer may cause missing block
> -
>
> Key: HDFS-16420
> URL: https://issues.apache.org/jira/browse/HDFS-16420
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-01-10-17-31-35-910.png, 
> image-2022-01-10-17-32-56-981.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We have a similar problem as HDFS-16297 described. 
> In our cluster, we used {color:#de350b}ec(6+3) + balancer with version 
> 3.1.0{color}, and the {color:#de350b}missing block{color} happened. 
> We got the block(blk_-9223372036824119008) info from fsck, only 5 live 
> replications and multiple redundant replications. 
> {code:java}
> blk_-9223372036824119008_220037616 len=133370338 MISSING! Live_repl=5
> blk_-9223372036824119007:DatanodeInfoWithStorage,   
> blk_-9223372036824119002:DatanodeInfoWithStorage,    
> blk_-9223372036824119001:DatanodeInfoWithStorage,  
> blk_-9223372036824119000:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage,  
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage {code}
>    
> We searched the log from all datanode, and found that the internal blocks of 
> blk_-9223372036824119008 were deleted almost at the same time.
>  
> {code:java}
> 08:15:58,550 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119008_220037616 URI 
> file:/data15/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119008
> 08:16:21,214 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119006_220037616 URI 
> file:/data4/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119006
> 08:16:55,737 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119005_220037616 URI 
> file:/data2/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119005
> {code}
>  
> The total number of internal blocks deleted during 08:15-08:17 are as follows
> ||internal block||index||    delete num||
> |blk_-9223372036824119008      
> blk_-9223372036824119006         
> blk_-9223372036824119005         
> blk_-9223372036824119004         
> blk_-9223372036824119003         
> blk_-9223372036824119000        |0
> 2
> 3
> 4
> 5
> 8|        1
>         1
>         1  
>         50
>         1
>         1|
>  
> {color:#ff}During 08:15 to 08:17, we restarted 2 datanode and triggered 
> full block report immediately.{color}
>  
> There are 2 questions: 
> 1. Why are there so many replicas of this block?
> 2. Why delete the internal block with only one copy?
> The reasons for the first problem may be as follows: 
> 1. We set the full block report period of some datanode to 168 hours.
> 2. We have done a namenode HA operation.
> 3. After namenode HA, the state of storage became 
> {color:#ff}stale{color}, and the state not change until next full block 
> report.
> 4. The balancer copied the replica without deleting the replica from source 
> node, because the source node have the stale storage, and the request was put 
> into {color:#ff}postponedMisreplicatedBlocks{color}.
> 5. Balancer continues to copy the replica, eventually resulting in multiple 
> copies of a replica
>

[jira] [Work logged] (HDFS-16420) ec + balancer may cause missing block

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16420?focusedWorklogId=708046=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-708046
 ]

ASF GitHub Bot logged work on HDFS-16420:
-

Author: ASF GitHub Bot
Created on: 13/Jan/22 03:26
Start Date: 13/Jan/22 03:26
Worklog Time Spent: 10m 
  Work Description: Jackson-Wang-7 opened a new pull request #3880:
URL: https://github.com/apache/hadoop/pull/3880


   …y striped blocks.
   
   
   
   ### Description of PR
   if there are two or more blocks exist in a same rack, it may cause unique 
data block is added to exactlyOne processing list when choosing redundancy 
stripted block to delete.
   `storages.remove(cur);
   if (storages.isEmpty()) {
 rackMap.remove(rack);
   }
   if (moreThanOne.remove(cur)) {
 if (storages.size() == 1) {
   **final DatanodeStorageInfo remaining = storages.get(0);
   moreThanOne.remove(remaining);
   exactlyOne.add(remaining);**
 }
   } else {
 exactlyOne.remove(cur);
   }`
   
   In this case, moreThanOne list may not contain the remaining block. The 
remaining block shouldn’t be deleted, but it is added to exactlyOne list. And 
then it will be deleted.
   
   ### How was this patch tested?
   The testcase is that:(EC 6+3)
   blk_-xxx009 in rack /d1/r1
   blo_-xxx008 in rack /d1/r1
   blo_-xxx008 in rack /d1/r2
   blo_-xxx008 in rack /d1/r3
   blk_-xxx007 in rack /d1/r4
   blo_-xxx006 in rack /d2/r1
   blk_-xxx005 in rack /d2/r2
   blo_-xxx004 in rack /d2/r3
   blk_-xxx003 in rack /d2/r4
   blo_-xxx002 in rack /d2/r5
   blk_-xxx001 in rack /d2/r6
   After the FBR is triggered and redundant data blocks are added to invalidate 
list, blo_-xxx008 in rack /d1/r1 and blo_-xxx008 in rack /d1/r2 need to be 
deleted, blk_-xxx009 is HEALTHY.
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 708046)
Remaining Estimate: 0h
Time Spent: 10m

> ec + balancer may cause missing block
> -
>
> Key: HDFS-16420
> URL: https://issues.apache.org/jira/browse/HDFS-16420
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: qinyuren
>Priority: Major
> Attachments: image-2022-01-10-17-31-35-910.png, 
> image-2022-01-10-17-32-56-981.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have a similar problem as HDFS-16297 described. 
> In our cluster, we used {color:#de350b}ec(6+3) + balancer with version 
> 3.1.0{color}, and the {color:#de350b}missing block{color} happened. 
> We got the block(blk_-9223372036824119008) info from fsck, only 5 live 
> replications and multiple redundant replications. 
> {code:java}
> blk_-9223372036824119008_220037616 len=133370338 MISSING! Live_repl=5
> blk_-9223372036824119007:DatanodeInfoWithStorage,   
> blk_-9223372036824119002:DatanodeInfoWithStorage,    
> blk_-9223372036824119001:DatanodeInfoWithStorage,  
> blk_-9223372036824119000:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage,  
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage {code}
>    
> We searched the log from all datanode, and found that the internal blocks of 
> blk_-9223372036824119008 were deleted almost at the same time.
>  
> {code:java}
> 08:15:58,550 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119008_220037616 URI 
> file:/data15/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119008
> 08:16:21,214 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333))

[jira] [Updated] (HDFS-16420) ec + balancer may cause missing block

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16420:
--
Labels: pull-request-available  (was: )

> ec + balancer may cause missing block
> -
>
> Key: HDFS-16420
> URL: https://issues.apache.org/jira/browse/HDFS-16420
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-01-10-17-31-35-910.png, 
> image-2022-01-10-17-32-56-981.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have a similar problem as HDFS-16297 described. 
> In our cluster, we used {color:#de350b}ec(6+3) + balancer with version 
> 3.1.0{color}, and the {color:#de350b}missing block{color} happened. 
> We got the block(blk_-9223372036824119008) info from fsck, only 5 live 
> replications and multiple redundant replications. 
> {code:java}
> blk_-9223372036824119008_220037616 len=133370338 MISSING! Live_repl=5
> blk_-9223372036824119007:DatanodeInfoWithStorage,   
> blk_-9223372036824119002:DatanodeInfoWithStorage,    
> blk_-9223372036824119001:DatanodeInfoWithStorage,  
> blk_-9223372036824119000:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage,  
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage {code}
>    
> We searched the log from all datanode, and found that the internal blocks of 
> blk_-9223372036824119008 were deleted almost at the same time.
>  
> {code:java}
> 08:15:58,550 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119008_220037616 URI 
> file:/data15/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119008
> 08:16:21,214 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119006_220037616 URI 
> file:/data4/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119006
> 08:16:55,737 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119005_220037616 URI 
> file:/data2/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119005
> {code}
>  
> The total number of internal blocks deleted during 08:15-08:17 are as follows
> ||internal block||index||    delete num||
> |blk_-9223372036824119008      
> blk_-9223372036824119006         
> blk_-9223372036824119005         
> blk_-9223372036824119004         
> blk_-9223372036824119003         
> blk_-9223372036824119000        |0
> 2
> 3
> 4
> 5
> 8|        1
>         1
>         1  
>         50
>         1
>         1|
>  
> {color:#ff}During 08:15 to 08:17, we restarted 2 datanode and triggered 
> full block report immediately.{color}
>  
> There are 2 questions: 
> 1. Why are there so many replicas of this block?
> 2. Why delete the internal block with only one copy?
> The reasons for the first problem may be as follows: 
> 1. We set the full block report period of some datanode to 168 hours.
> 2. We have done a namenode HA operation.
> 3. After namenode HA, the state of storage became 
> {color:#ff}stale{color}, and the state not change until next full block 
> report.
> 4. The balancer copied the replica without deleting the replica from source 
> node, because the source node have the stale storage, and the request was put 
> into {color:#ff}postponedMisreplicatedBlocks{color}.
> 5. Balancer continues to copy the replica, eventually resulting in multiple 
> copies of a replica
> !image-2022-01-10-17-31-35-910.png|width=642,height=269!
> The set of {color:#ff}rescannedMisreplicatedBlocks{color} have so many 
> block to remove.
> !image-2022-01-10-17-32-56-981.png|width=745,height=124!
> As for the second question, we checked the code of 
> {color:#de350b}processExtraRedundancyBlock{color}, but didn't find any 
> problem.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16043) Add markedDeleteBlockScrubberThread to delete blocks asynchronously

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16043?focusedWorklogId=708045=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-708045
 ]

ASF GitHub Bot logged work on HDFS-16043:
-

Author: ASF GitHub Bot
Created on: 13/Jan/22 03:20
Start Date: 13/Jan/22 03:20
Worklog Time Spent: 10m 
  Work Description: Hexiaoqiao commented on pull request #3063:
URL: https://github.com/apache/hadoop/pull/3063#issuecomment-1011744871


   @zhuxiangyi Do you mind to fix the checkstyle refer to 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3063/19/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt.
 Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 708045)
Time Spent: 8h 50m  (was: 8h 40m)

> Add markedDeleteBlockScrubberThread to delete blocks asynchronously
> ---
>
> Key: HDFS-16043
> URL: https://issues.apache.org/jira/browse/HDFS-16043
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namanode
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210527-after.svg, 20210527-before.svg
>
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> The deletion of the large directory caused NN to hold the lock for too long, 
> which caused our NameNode to be killed by ZKFC.
>  Through the flame graph, it is found that its main time-consuming 
> calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting 
> inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time.
> h3. solution:
> 1. RemoveBlocks is processed asynchronously. A thread is started in the 
> BlockManager to process the deleted blocks and control the lock time.
>  2. QuotaCount calculation optimization, this is similar to the optimization 
> of this Issue HDFS-16000.
> h3. Comparison before and after optimization:
> Delete 1000w Inode and 1000w block test.
>  *before:*
> remove inode elapsed time: 7691 ms
>  remove block elapsed time :11107 ms
>  *after:*
>  remove inode elapsed time: 4149 ms
>  remove block elapsed time :0 ms



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16411) RBF: RouterId is NULL when set dfs.federation.router.rpc.enable=false

2022-01-12 Thread YulongZ (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YulongZ updated HDFS-16411:
---
Attachment: HDFS-16411.003.patch

> RBF: RouterId is NULL when set dfs.federation.router.rpc.enable=false
> -
>
> Key: HDFS-16411
> URL: https://issues.apache.org/jira/browse/HDFS-16411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: YulongZ
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-16411.000.patch, HDFS-16411.001.patch, 
> HDFS-16411.002.patch, HDFS-16411.003.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When dfs.federation.router.rpc.enable=false, routerid is null, but 
> RouterHeartbeatService need updateStateStore() with routerId.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16411) RBF: RouterId is NULL when set dfs.federation.router.rpc.enable=false

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16411?focusedWorklogId=708005=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-708005
 ]

ASF GitHub Bot logged work on HDFS-16411:
-

Author: ASF GitHub Bot
Created on: 13/Jan/22 01:49
Start Date: 13/Jan/22 01:49
Worklog Time Spent: 10m 
  Work Description: yulongz commented on a change in pull request #3878:
URL: https://github.com/apache/hadoop/pull/3878#discussion_r783569891



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouter.java
##
@@ -238,12 +238,16 @@ public void testSwitchRouter() throws IOException {
   private void assertRouterHeartbeater(boolean expectedRouterHeartbeat,
   boolean expectedNNHeartbeat) throws IOException {
 final Router router = new Router();
-Configuration baseCfg = new RouterConfigBuilder(conf).rpc().build();

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 708005)
Time Spent: 50m  (was: 40m)

> RBF: RouterId is NULL when set dfs.federation.router.rpc.enable=false
> -
>
> Key: HDFS-16411
> URL: https://issues.apache.org/jira/browse/HDFS-16411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: YulongZ
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-16411.000.patch, HDFS-16411.001.patch, 
> HDFS-16411.002.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When dfs.federation.router.rpc.enable=false, routerid is null, but 
> RouterHeartbeatService need updateStateStore() with routerId.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=707956=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707956
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 12/Jan/22 23:33
Start Date: 12/Jan/22 23:33
Worklog Time Spent: 10m 
  Work Description: bbeaudreault commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-1011548807


   Hey @sodonnel , the build has finished. Unfortunately there was 1 failure,  
yet I don't think it's related. It's a timeout in RollingUpgradeTest. Otherwise 
the checkstyle issues are cleaned up as are all the other failures. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 707956)
Time Spent: 4h 10m  (was: 4h)

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=707929=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707929
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 12/Jan/22 22:27
Start Date: 12/Jan/22 22:27
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-1011507918


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 39s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  13m 28s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  25m 21s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   6m 16s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   5m 47s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 22s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 30s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 55s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   2m 27s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   6m 49s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m 56s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 28s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 23s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   6m 21s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   6m 21s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   5m 36s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  4s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 14s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   1m 31s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 59s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   6m 32s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m 31s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 30s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 232m  0s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/15/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 43s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 379m 44s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestRollingUpgrade |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3527/15/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3527 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell xml |
   | uname | Linux a1c99da05f39 4.15.0-161-generic #169-Ubuntu SMP Fri Oct 15 
13:41:54 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 54096317ed298d57ba490bd1ddfc693c331546ba |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions |

[jira] [Work logged] (HDFS-16411) RBF: RouterId is NULL when set dfs.federation.router.rpc.enable=false

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16411?focusedWorklogId=707734=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707734
 ]

ASF GitHub Bot logged work on HDFS-16411:
-

Author: ASF GitHub Bot
Created on: 12/Jan/22 17:54
Start Date: 12/Jan/22 17:54
Worklog Time Spent: 10m 
  Work Description: goiri commented on a change in pull request #3878:
URL: https://github.com/apache/hadoop/pull/3878#discussion_r783311962



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouter.java
##
@@ -238,12 +238,16 @@ public void testSwitchRouter() throws IOException {
   private void assertRouterHeartbeater(boolean expectedRouterHeartbeat,
   boolean expectedNNHeartbeat) throws IOException {
 final Router router = new Router();
-Configuration baseCfg = new RouterConfigBuilder(conf).rpc().build();

Review comment:
   We should have both tests one with false and one without.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 707734)
Time Spent: 40m  (was: 0.5h)

> RBF: RouterId is NULL when set dfs.federation.router.rpc.enable=false
> -
>
> Key: HDFS-16411
> URL: https://issues.apache.org/jira/browse/HDFS-16411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: YulongZ
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-16411.000.patch, HDFS-16411.001.patch, 
> HDFS-16411.002.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When dfs.federation.router.rpc.enable=false, routerid is null, but 
> RouterHeartbeatService need updateStateStore() with routerId.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16043) Add markedDeleteBlockScrubberThread to delete blocks asynchronously

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16043?focusedWorklogId=707711=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707711
 ]

ASF GitHub Bot logged work on HDFS-16043:
-

Author: ASF GitHub Bot
Created on: 12/Jan/22 17:41
Start Date: 12/Jan/22 17:41
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3063:
URL: https://github.com/apache/hadoop/pull/3063#issuecomment-1011295026


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  12m 23s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 15 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 42s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m  6s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 27s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 13s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 22s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 12s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 56s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3063/19/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 5 new + 642 unchanged 
- 2 fixed = 647 total (was 644)  |
   | +1 :green_heart: |  mvnsite  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 52s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 21s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 13s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 40s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 232m 46s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 46s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 344m  5s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3063/19/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3063 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell xml |
   | uname | Linux 25a72eabe263 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 
23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 0d1f846778d58a915b96a05494e35e81d8ea966c |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3063/19/testReport/ |
   | Max.

[jira] [Work logged] (HDFS-16043) Add markedDeleteBlockScrubberThread to delete blocks asynchronously

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16043?focusedWorklogId=707640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707640
 ]

ASF GitHub Bot logged work on HDFS-16043:
-

Author: ASF GitHub Bot
Created on: 12/Jan/22 16:52
Start Date: 12/Jan/22 16:52
Worklog Time Spent: 10m 
  Work Description: Hexiaoqiao commented on pull request #3063:
URL: https://github.com/apache/hadoop/pull/3063#issuecomment-1011251305


   Will commit to trunk once jenkins completed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 707640)
Time Spent: 8.5h  (was: 8h 20m)

> Add markedDeleteBlockScrubberThread to delete blocks asynchronously
> ---
>
> Key: HDFS-16043
> URL: https://issues.apache.org/jira/browse/HDFS-16043
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namanode
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210527-after.svg, 20210527-before.svg
>
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> The deletion of the large directory caused NN to hold the lock for too long, 
> which caused our NameNode to be killed by ZKFC.
>  Through the flame graph, it is found that its main time-consuming 
> calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting 
> inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time.
> h3. solution:
> 1. RemoveBlocks is processed asynchronously. A thread is started in the 
> BlockManager to process the deleted blocks and control the lock time.
>  2. QuotaCount calculation optimization, this is similar to the optimization 
> of this Issue HDFS-16000.
> h3. Comparison before and after optimization:
> Delete 1000w Inode and 1000w block test.
>  *before:*
> remove inode elapsed time: 7691 ms
>  remove block elapsed time :11107 ms
>  *after:*
>  remove inode elapsed time: 4149 ms
>  remove block elapsed time :0 ms



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=707603=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707603
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 12/Jan/22 16:07
Start Date: 12/Jan/22 16:07
Worklog Time Spent: 10m 
  Work Description: bbeaudreault commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-1011205367


   I just pushed a rebase of this branch on latest trunk, with checkstyle 
issues fixed. I'll keep an eye on the build and ping you once it looks good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 707603)
Time Spent: 3h 50m  (was: 3h 40m)

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=707594=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707594
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 12/Jan/22 15:51
Start Date: 12/Jan/22 15:51
Worklog Time Spent: 10m 
  Work Description: bbeaudreault commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-1011185700


   Sorry! This had been succeeding last time i looked at it, I should have 
verified that it was still succeeding before pinging you. I'll take a look at 
the failures shortly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 707594)
Time Spent: 3h 40m  (was: 3.5h)

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16262?focusedWorklogId=707590=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707590
 ]

ASF GitHub Bot logged work on HDFS-16262:
-

Author: ASF GitHub Bot
Created on: 12/Jan/22 15:48
Start Date: 12/Jan/22 15:48
Worklog Time Spent: 10m 
  Work Description: sodonnel commented on pull request #3527:
URL: https://github.com/apache/hadoop/pull/3527#issuecomment-1011182118


   The last CI runs flagged some checkstyle issues. Could you check those 
please and fix them? A few others parts of that run failed too, but it may have 
been some wider issue. Fixing the checkstyle will trigger a new run and we can 
see how it looks then.
   
   The patch looks mostly good to me, but there is a lot of change in 
DFSInputStream, so I will need to take some more time to go over that part of 
the PR again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 707590)
Time Spent: 3.5h  (was: 3h 20m)

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16419) Make HDFS data transfer tools cross platform

2022-01-12 Thread Gautham Banasandra (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautham Banasandra resolved HDFS-16419.
---
Resolution: Fixed

Merged https://github.com/apache/hadoop/pull/3873 to trunk.

> Make HDFS data transfer tools cross platform
> 
>
> Key: HDFS-16419
> URL: https://issues.apache.org/jira/browse/HDFS-16419
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs++, tools
>Affects Versions: 3.4.0
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>  Labels: libhdfscpp, pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The source files for *hdfs_copyToLocal* and *hdfs_moveToLocal* uses getopt 
> for parsing the command line arguments. getopt is available only on Linux and 
> thus, isn't cross platform. We need to replace getopt with 
> boost::program_options to make these tools cross platform.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16419) Make HDFS data transfer tools cross platform

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16419?focusedWorklogId=707526=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707526
 ]

ASF GitHub Bot logged work on HDFS-16419:
-

Author: ASF GitHub Bot
Created on: 12/Jan/22 14:27
Start Date: 12/Jan/22 14:27
Worklog Time Spent: 10m 
  Work Description: GauthamBanasandra merged pull request #3873:
URL: https://github.com/apache/hadoop/pull/3873


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 707526)
Time Spent: 1h  (was: 50m)

> Make HDFS data transfer tools cross platform
> 
>
> Key: HDFS-16419
> URL: https://issues.apache.org/jira/browse/HDFS-16419
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, libhdfs++, tools
>Affects Versions: 3.4.0
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>  Labels: libhdfscpp, pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The source files for *hdfs_copyToLocal* and *hdfs_moveToLocal* uses getopt 
> for parsing the command line arguments. getopt is available only on Linux and 
> thus, isn't cross platform. We need to replace getopt with 
> boost::program_options to make these tools cross platform.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16406) DataNode metric ReadsFromLocalClient does not count short-circuit reads

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16406?focusedWorklogId=707512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707512
 ]

ASF GitHub Bot logged work on HDFS-16406:
-

Author: ASF GitHub Bot
Created on: 12/Jan/22 14:11
Start Date: 12/Jan/22 14:11
Worklog Time Spent: 10m 
  Work Description: secfree commented on pull request #3847:
URL: https://github.com/apache/hadoop/pull/3847#issuecomment-1011084197


   @ferhui thanks for your review. The failed test case is the one you recorded 
HDFS-16169. And I have raised a PR for it: 
https://github.com/apache/hadoop/pull/3850


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 707512)
Time Spent: 40m  (was: 0.5h)

> DataNode metric ReadsFromLocalClient does not count short-circuit reads
> ---
>
> Key: HDFS-16406
> URL: https://issues.apache.org/jira/browse/HDFS-16406
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: secfree
>Assignee: secfree
>Priority: Minor
>  Labels: metrics, pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The following test case failed. 
> {code}
>   @Test
>   public void testNodeLocalMetrics() throws Exception {
> Assume.assumeTrue(null == DomainSocket.getLoadingFailureReason());
> Configuration conf = new HdfsConfiguration();
> conf.setBoolean(HdfsClientConfigKeys.Read.ShortCircuit.KEY, true);
> TemporarySocketDirectory sockDir = new TemporarySocketDirectory();
> DomainSocket.disableBindPathValidation();
> conf.set(DFSConfigKeys.DFS_DOMAIN_SOCKET_PATH_KEY,
> new File(sockDir.getDir(),
> "testNodeLocalMetrics._PORT.sock").getAbsolutePath());
> MiniDFSCluster cluster = new 
> MiniDFSCluster.Builder(conf).numDataNodes(1).build();
> try {
>   cluster.waitActive();
>   FileSystem fs = cluster.getFileSystem();
>   Path testFile = new Path("/testNodeLocalMetrics.txt");
>   long file_len = 10;
>   DFSTestUtil.createFile(fs, testFile, file_len, (short)1, 1L);
>   DFSTestUtil.readFile(fs, testFile);
>   List datanodes = cluster.getDataNodes();
>   assertEquals(datanodes.size(), 1);
>   DataNode datanode = datanodes.get(0);
>   MetricsRecordBuilder rb = getMetrics(datanode.getMetrics().name());
>   // Write related metrics
>   assertCounter("WritesFromLocalClient", 1L, rb);
>   // Read related metrics
>   assertCounter("ReadsFromLocalClient", 1L, rb); // failed here
> } finally {
>   if (cluster != null) {
> cluster.shutdown();
>   }
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16019) HDFS: Inode CheckPoint

2022-01-12 Thread Xiangyi Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangyi Zhu resolved HDFS-16019.

Resolution: Later

> HDFS: Inode CheckPoint 
> ---
>
> Key: HDFS-16019
> URL: https://issues.apache.org/jira/browse/HDFS-16019
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Affects Versions: 3.3.1
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>
> *background*
> The OIV IMAGE analysis tool has brought us many benefits, such as file size 
> distribution, cold and hot data, abnormal growth directory analysis. But in 
> my opinion he is too slow, especially the big IMAGE.
> After Hadoop 2.3, the format of IMAGE has changed. For OIV tools, it is 
> necessary to load the entire IMAGE into the memory to output the inode 
> information into a text format. For large IMAGE, this process takes a long 
> time and consumes more resources and requires a large memory machine to 
> analyze.
> Although, HDFS provides the dfs.namenode.legacy-oiv-image.dir parameter to 
> get the old version of IMAGE through CheckPoint. The old IMAGE parsing does 
> not require too many resources, but we need to parse the IMAGE again through 
> the hdfs oiv_legacy command to get the text information of the Inode, which 
> is relatively time-consuming.
> **
> *Solution*
> We can ask the standby node to periodically check the Inode and serialize the 
> Inode in text mode. For OutPut, different FileSystems can be used according 
> to the configuration, such as the local file system or the HDFS file system.
> The advantage of providing HDFS file system is that we can analyze Inode 
> directly through spark/hive. I think the block information corresponding to 
> the Inode may not be of much use. The size of the file and the number of 
> copies are more useful to us.
> In addition, the sequential output of the Inode is not necessary. We can 
> speed up the CheckPoint for the Inode, and use the partition for the 
> serialized Inode to output different files. Use a production thread to put 
> Inode in the Queue, and use multi-threaded consumption Queue to write to 
> different partition files. For output files, compression can also be used to 
> reduce disk IO.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16043) Add markedDeleteBlockScrubberThread to delete blocks asynchronously

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16043?focusedWorklogId=707452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707452
 ]

ASF GitHub Bot logged work on HDFS-16043:
-

Author: ASF GitHub Bot
Created on: 12/Jan/22 12:31
Start Date: 12/Jan/22 12:31
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3063:
URL: https://github.com/apache/hadoop/pull/3063#issuecomment-1010998765


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 54s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  1s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 15 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  36m 25s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   1m 17s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m  2s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 28s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  3s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 22s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m 34s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 24s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   1m 24s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 13s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 56s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3063/18/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 7 new + 641 unchanged 
- 2 fixed = 648 total (was 643)  |
   | +1 :green_heart: |  mvnsite  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 53s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 25s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m 12s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 328m 59s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 38s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 437m 53s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3063/18/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3063 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell xml |
   | uname | Linux c4e1d249c0ad 4.15.0-162-generic #170-Ubuntu SMP Mon Oct 18 
11:38:05 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 108e96cf1b503c34e211d104c223c074aca1b984 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3063/18/testReport/ |
   | Max.

[jira] [Work logged] (HDFS-16411) RBF: RouterId is NULL when set dfs.federation.router.rpc.enable=false

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16411?focusedWorklogId=707446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707446
 ]

ASF GitHub Bot logged work on HDFS-16411:
-

Author: ASF GitHub Bot
Created on: 12/Jan/22 12:21
Start Date: 12/Jan/22 12:21
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3878:
URL: https://github.com/apache/hadoop/pull/3878#issuecomment-1010989809


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 56s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  35m 55s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   0m 23s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 40s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 53s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 17s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 32s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 16s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3878/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0)  |
   | +1 :green_heart: |  mvnsite  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 31s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 48s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 45s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  38m 26s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3878/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 49s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 134m 54s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.rbfbalance.TestRouterDistCpProcedure |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3878/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3878 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 477069bc1b8f 4.15.0-142-generic #146-Ubuntu SMP Tue Apr 13 
01:11:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / dc03d2d0e4178e1a0e704545576fb52117ea7828 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions |

[jira] [Updated] (HDFS-16411) RBF: RouterId is NULL when set dfs.federation.router.rpc.enable=false

2022-01-12 Thread YulongZ (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YulongZ updated HDFS-16411:
---
Attachment: HDFS-16411.002.patch

> RBF: RouterId is NULL when set dfs.federation.router.rpc.enable=false
> -
>
> Key: HDFS-16411
> URL: https://issues.apache.org/jira/browse/HDFS-16411
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: YulongZ
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-16411.000.patch, HDFS-16411.001.patch, 
> HDFS-16411.002.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When dfs.federation.router.rpc.enable=false, routerid is null, but 
> RouterHeartbeatService need updateStateStore() with routerId.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16400) Reconfig DataXceiver parameters for datanode

2022-01-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16400?focusedWorklogId=707387=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-707387
 ]

ASF GitHub Bot logged work on HDFS-16400:
-

Author: ASF GitHub Bot
Created on: 12/Jan/22 10:22
Start Date: 12/Jan/22 10:22
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3843:
URL: https://github.com/apache/hadoop/pull/3843#issuecomment-1010885883


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  0s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 58s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 31s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  4s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 17s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 43s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 20s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 11s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 50s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3843/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 80 unchanged - 
2 fixed = 81 total (was 82)  |
   | +1 :green_heart: |  mvnsite  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 53s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 16s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 18s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 412m  1s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3843/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 46s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 512m 14s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3843/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3843 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 38df6a28e1cc 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 
17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 3e738afea78fa6738b0de6767a65521a213263fb |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04

[jira] [Commented] (HDFS-16420) ec + balancer may cause missing block

2022-01-12 Thread liuhongtong (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17474335#comment-17474335
 ] 

liuhongtong commented on HDFS-16420:


We have found the cause of the problem, which also exists in the latest code. 
We'll fix this later.

> ec + balancer may cause missing block
> -
>
> Key: HDFS-16420
> URL: https://issues.apache.org/jira/browse/HDFS-16420
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: qinyuren
>Priority: Major
> Attachments: image-2022-01-10-17-31-35-910.png, 
> image-2022-01-10-17-32-56-981.png
>
>
> We have a similar problem as HDFS-16297 described. 
> In our cluster, we used {color:#de350b}ec(6+3) + balancer with version 
> 3.1.0{color}, and the {color:#de350b}missing block{color} happened. 
> We got the block(blk_-9223372036824119008) info from fsck, only 5 live 
> replications and multiple redundant replications. 
> {code:java}
> blk_-9223372036824119008_220037616 len=133370338 MISSING! Live_repl=5
> blk_-9223372036824119007:DatanodeInfoWithStorage,   
> blk_-9223372036824119002:DatanodeInfoWithStorage,    
> blk_-9223372036824119001:DatanodeInfoWithStorage,  
> blk_-9223372036824119000:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage,  
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage {code}
>    
> We searched the log from all datanode, and found that the internal blocks of 
> blk_-9223372036824119008 were deleted almost at the same time.
>  
> {code:java}
> 08:15:58,550 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119008_220037616 URI 
> file:/data15/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119008
> 08:16:21,214 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119006_220037616 URI 
> file:/data4/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119006
> 08:16:55,737 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119005_220037616 URI 
> file:/data2/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119005
> {code}
>  
> The total number of internal blocks deleted during 08:15-08:17 are as follows
> ||internal block||index||    delete num||
> |blk_-9223372036824119008      
> blk_-9223372036824119006         
> blk_-9223372036824119005         
> blk_-9223372036824119004         
> blk_-9223372036824119003         
> blk_-9223372036824119000        |0
> 2
> 3
> 4
> 5
> 8|        1
>         1
>         1  
>         50
>         1
>         1|
>  
> {color:#ff}During 08:15 to 08:17, we restarted 2 datanode and triggered 
> full block report immediately.{color}
>  
> There are 2 questions: 
> 1. Why are there so many replicas of this block?
> 2. Why delete the internal block with only one copy?
> The reasons for the first problem may be as follows: 
> 1. We set the full block report period of some datanode to 168 hours.
> 2. We have done a namenode HA operation.
> 3. After namenode HA, the state of storage became 
> {color:#ff}stale{color}, and the state not change until next full block 
> report.
> 4. The balancer copied the replica without deleting the replica from source 
> node, because the source node have the stale storage, and the request was put 
> into {color:#ff}postponedMisreplicatedBlocks{color}.
> 5. Balancer continues to copy the replica, eventually resulting in multiple 
> copies of a replica
> !image-2022-01-10-17-31-35-910.png|width=642,height=269!
> The set of {color:#ff}rescannedMisreplicatedBlocks{color} have so many 
> block to remove.
> !image-2022-01-10-17-32-56-981.png|width=745,height=124!
> As for the second question, we checked the code of 
> {color:#de350b}processExtraRedundancyBlock{color}, but didn't find any 
> problem.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

38 matches

Mail list logo