[jira] [Updated] (HDFS-16808) HDFS metrics will hold the previous value if there is no new call

2022-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16808:
--
Labels: pull-request-available  (was: )

> HDFS metrics will hold the previous value if there is no new call
> -
>
> Key: HDFS-16808
> URL: https://issues.apache.org/jira/browse/HDFS-16808
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: leo sun
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-10-19-23-59-19-673.png
>
>
> According to the implementation of MutableStat.snapshot(), HDFS metrics will 
> always hold the previous value if there is no more new call.
> It will cause even if user switch active and standby, the previous 
> ANN(standby now) will always output the old value as the pic shows
> !image-2022-10-19-23-59-19-673.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16808) HDFS metrics will hold the previous value if there is no new call

2022-10-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620779#comment-17620779
 ] 

ASF GitHub Bot commented on HDFS-16808:
---

ted12138 opened a new pull request, #5049:
URL: https://github.com/apache/hadoop/pull/5049

   
   
   ### Description of PR
   
   
   ### How was this patch tested?
   
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> HDFS metrics will hold the previous value if there is no new call
> -
>
> Key: HDFS-16808
> URL: https://issues.apache.org/jira/browse/HDFS-16808
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: leo sun
>Priority: Major
> Attachments: image-2022-10-19-23-59-19-673.png
>
>
> According to the implementation of MutableStat.snapshot(), HDFS metrics will 
> always hold the previous value if there is no more new call.
> It will cause even if user switch active and standby, the previous 
> ANN(standby now) will always output the old value as the pic shows
> !image-2022-10-19-23-59-19-673.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16806) ec data balancer block blk_id The index error ,Data cannot be moved

2022-10-19 Thread ruiliang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ruiliang resolved HDFS-16806.
-
Hadoop Flags: Reviewed
  Resolution: Fixed

> ec data balancer block blk_id The index error ,Data cannot be moved
> ---
>
> Key: HDFS-16806
> URL: https://issues.apache.org/jira/browse/HDFS-16806
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: ruiliang
>Priority: Critical
> Attachments: image-2022-10-20-11-32-35-833.png
>
>
> ec data balancer block blk_id The index error ,Data cannot be moved
> dn->10.12.15.149 use disk 100%
>  
> {code:java}
> echo 10.12.15.149>sorucehost
> balancer  -fs hdfs://xxcluster06  -threshold 10 -source -f sorucehost   
> 2>>~/balancer.log &  {code}
>  
> datanode logs 
> A lot of this log output  
> {code:java}
> datanode logs
> ...
> 2022-10-19 14:43:02,031 ERROR datanode.DataNode (DataXceiver.java:run(321)) - 
> fs-hiido-dn-12-15-149.xx.com:1019:DataXceiver error processing COPY_BLOCK 
> operation  src: /10.12.65.216:58214 dst: /10.12.15.149:1019
> org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not 
> found for 
> BP-1822992414-10.12.65.48-1660893388633:blk_-9223372036799576592_4218617
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:492)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:256)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.copyBlock(DataXceiver.java:1089)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opCopyBlock(Receiver.java:291)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:113)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:290)
>         at java.lang.Thread.run(Thread.java:748)
> ...    
>     
> hdfs fsck -fs hdfs://xxcluster06 -blockId blk_-9223372036799576592 
> Connecting to namenode via 
> http://fs-hiido-xxcluster06-yynn2.xx.com:50070/fsck?ugi=hdfs&blockId=blk_-9223372036799576592+&path=%2F
> FSCK started by hdfs (auth:KERBEROS_SSL) from /10.12.19.4 at Wed Oct 19 
> 14:47:15 CST 2022Block Id: blk_-9223372036799576592
> Block belongs to: 
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
> No. of Expected Replica: 5
> No. of live Replica: 5
> No. of excess Replica: 0
> No. of stale Replica: 5
> No. of decommissioned Replica: 0
> No. of decommissioning Replica: 0
> No. of corrupted Replica: 0
> Block replica on datanode/rack: fs-hiido-dn-12-66-4.xx.com/4F08-01-09 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-65-244.xx.com/4F08-01-08 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-15-149.xx.com/4F08-05-13 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-65-218.xx.com/4F08-12-04 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-17-35.xx.com/4F08-03-03 is 
> HEALTHY
> hdfs fsck -fs hdfs://xxcluster06 
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
>  -files -blocks -locations
> Connecting to namenode via 
> http://xx.com:50070/fsck?ugi=hdfs&files=1&blocks=1&locations=1&path=%2Fhive_warehouse%2Fwarehouse_old_snapshots%2Fyy_mbsdkevent_original%2Fdt%3D20210505%2Fpost_202105052129_33.log.gz
> FSCK started by hdfs (auth:KERBEROS_SSL) from /10.12.19.4 for path 
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
>  at Wed Oct 19 14:48:42 CST 2022
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
>  500582412 bytes, erasure-coded: policy=RS-3-2-1024k, 1 block(s):  OK
> 0. BP-1822992414-10.12.65.48-1660893388633:blk_-9223372036799576592_4218617 
> len=500582412 Live_repl=5  
> [blk_-9223372036799576592:DatanodeInfoWithStorage[10.12.17.35:1019,DS-3ccebf8d-5f05-45b5-ac7f-96d1cfb48608,DISK],
>  
> blk_-9223372036799576591:DatanodeInfoWithStorage[10.12.65.218:1019,DS-4f8e3114-7566-4cf1-ad5a-e454c8ea8805,DISK],
>  
> blk_-9223372036799576590:DatanodeInfoWithStorage[10.12.15.149:1019,DS-1dd55c27-8f47-46a6-935b-1d9024ca9188,DISK],
>  
> blk_-9223372036799576589:DatanodeInfoWithStorage[10.12.65.244:1019,DS-a9ffd747-c427-4aaa-8559-04cded7d9d5f,DISK],
>  
> blk_-9223372036799576588:DatanodeInfoWithStorage[10.12.66.4:1019,DS-d88f94db-6db1-4753-a652-780d7cd7f081,DISK]]
> Status: HEALTHY
>  Number of data-nodes:  62
>  Number of racks:               19
>  Total dirs:                    0
>  Total symlinks:                0Replicated Blocks:
>  Total size:    0 B
>  Total files:   0
>  Total blocks (validated):      0
>  Minimally replicated blocks:   0
>  Over-replica

[jira] [Commented] (HDFS-3570) Balancer shouldn't rely on "DFS Space Used %" as that ignores non-DFS used space

2022-10-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620751#comment-17620751
 ] 

ASF GitHub Bot commented on HDFS-3570:
--

hadoop-yetus commented on PR #5044:
URL: https://github.com/apache/hadoop/pull/5044#issuecomment-1284967160

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 56s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  42m 15s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 30s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 26s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 42s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 26s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 40s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m  2s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 24s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 23s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 23s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 58s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 16s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 28s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 243m 56s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m  7s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 357m 41s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5044/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5044 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint 
|
   | uname | Linux 35a3ff40da0a 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 43e802df586a2e1d8e8a429d64b9163b20d927f9 |
   | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5044/2/testReport/ |
   | Max. process+thread count | 3023 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5044/2/conso

[jira] [Commented] (HDFS-16806) ec data balancer block blk_id The index error ,Data cannot be moved

2022-10-19 Thread ruiliang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620704#comment-17620704
 ] 

ruiliang commented on HDFS-16806:
-

After I pull HDFS-16333, I only update hadoop-hdfs.jar on balancer client 
service, and the problem is solved. The following figure is a comparison before 
and after the update.

!image-2022-10-20-11-32-35-833.png!

> ec data balancer block blk_id The index error ,Data cannot be moved
> ---
>
> Key: HDFS-16806
> URL: https://issues.apache.org/jira/browse/HDFS-16806
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: ruiliang
>Priority: Critical
> Attachments: image-2022-10-20-11-32-35-833.png
>
>
> ec data balancer block blk_id The index error ,Data cannot be moved
> dn->10.12.15.149 use disk 100%
>  
> {code:java}
> echo 10.12.15.149>sorucehost
> balancer  -fs hdfs://xxcluster06  -threshold 10 -source -f sorucehost   
> 2>>~/balancer.log &  {code}
>  
> datanode logs 
> A lot of this log output  
> {code:java}
> datanode logs
> ...
> 2022-10-19 14:43:02,031 ERROR datanode.DataNode (DataXceiver.java:run(321)) - 
> fs-hiido-dn-12-15-149.xx.com:1019:DataXceiver error processing COPY_BLOCK 
> operation  src: /10.12.65.216:58214 dst: /10.12.15.149:1019
> org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not 
> found for 
> BP-1822992414-10.12.65.48-1660893388633:blk_-9223372036799576592_4218617
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:492)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:256)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.copyBlock(DataXceiver.java:1089)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opCopyBlock(Receiver.java:291)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:113)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:290)
>         at java.lang.Thread.run(Thread.java:748)
> ...    
>     
> hdfs fsck -fs hdfs://xxcluster06 -blockId blk_-9223372036799576592 
> Connecting to namenode via 
> http://fs-hiido-xxcluster06-yynn2.xx.com:50070/fsck?ugi=hdfs&blockId=blk_-9223372036799576592+&path=%2F
> FSCK started by hdfs (auth:KERBEROS_SSL) from /10.12.19.4 at Wed Oct 19 
> 14:47:15 CST 2022Block Id: blk_-9223372036799576592
> Block belongs to: 
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
> No. of Expected Replica: 5
> No. of live Replica: 5
> No. of excess Replica: 0
> No. of stale Replica: 5
> No. of decommissioned Replica: 0
> No. of decommissioning Replica: 0
> No. of corrupted Replica: 0
> Block replica on datanode/rack: fs-hiido-dn-12-66-4.xx.com/4F08-01-09 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-65-244.xx.com/4F08-01-08 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-15-149.xx.com/4F08-05-13 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-65-218.xx.com/4F08-12-04 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-17-35.xx.com/4F08-03-03 is 
> HEALTHY
> hdfs fsck -fs hdfs://xxcluster06 
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
>  -files -blocks -locations
> Connecting to namenode via 
> http://xx.com:50070/fsck?ugi=hdfs&files=1&blocks=1&locations=1&path=%2Fhive_warehouse%2Fwarehouse_old_snapshots%2Fyy_mbsdkevent_original%2Fdt%3D20210505%2Fpost_202105052129_33.log.gz
> FSCK started by hdfs (auth:KERBEROS_SSL) from /10.12.19.4 for path 
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
>  at Wed Oct 19 14:48:42 CST 2022
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
>  500582412 bytes, erasure-coded: policy=RS-3-2-1024k, 1 block(s):  OK
> 0. BP-1822992414-10.12.65.48-1660893388633:blk_-9223372036799576592_4218617 
> len=500582412 Live_repl=5  
> [blk_-9223372036799576592:DatanodeInfoWithStorage[10.12.17.35:1019,DS-3ccebf8d-5f05-45b5-ac7f-96d1cfb48608,DISK],
>  
> blk_-9223372036799576591:DatanodeInfoWithStorage[10.12.65.218:1019,DS-4f8e3114-7566-4cf1-ad5a-e454c8ea8805,DISK],
>  
> blk_-9223372036799576590:DatanodeInfoWithStorage[10.12.15.149:1019,DS-1dd55c27-8f47-46a6-935b-1d9024ca9188,DISK],
>  
> blk_-9223372036799576589:DatanodeInfoWithStorage[10.12.65.244:1019,DS-a9ffd747-c427-4aaa-8559-04cded7d9d5f,DISK],
>  
> blk_-9223372036799576588:DatanodeInfoWithStorage[10.12.66.4:1019,DS-d88f94db-6db1-4753-a652-780d7cd7f081,DISK]]
> Status: HEALTHY
>  Number of data-nodes:  62
>  Number of racks:               19

[jira] [Updated] (HDFS-16806) ec data balancer block blk_id The index error ,Data cannot be moved

2022-10-19 Thread ruiliang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ruiliang updated HDFS-16806:

Attachment: image-2022-10-20-11-32-35-833.png

> ec data balancer block blk_id The index error ,Data cannot be moved
> ---
>
> Key: HDFS-16806
> URL: https://issues.apache.org/jira/browse/HDFS-16806
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: ruiliang
>Priority: Critical
> Attachments: image-2022-10-20-11-32-35-833.png
>
>
> ec data balancer block blk_id The index error ,Data cannot be moved
> dn->10.12.15.149 use disk 100%
>  
> {code:java}
> echo 10.12.15.149>sorucehost
> balancer  -fs hdfs://xxcluster06  -threshold 10 -source -f sorucehost   
> 2>>~/balancer.log &  {code}
>  
> datanode logs 
> A lot of this log output  
> {code:java}
> datanode logs
> ...
> 2022-10-19 14:43:02,031 ERROR datanode.DataNode (DataXceiver.java:run(321)) - 
> fs-hiido-dn-12-15-149.xx.com:1019:DataXceiver error processing COPY_BLOCK 
> operation  src: /10.12.65.216:58214 dst: /10.12.15.149:1019
> org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not 
> found for 
> BP-1822992414-10.12.65.48-1660893388633:blk_-9223372036799576592_4218617
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:492)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:256)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.copyBlock(DataXceiver.java:1089)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opCopyBlock(Receiver.java:291)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:113)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:290)
>         at java.lang.Thread.run(Thread.java:748)
> ...    
>     
> hdfs fsck -fs hdfs://xxcluster06 -blockId blk_-9223372036799576592 
> Connecting to namenode via 
> http://fs-hiido-xxcluster06-yynn2.xx.com:50070/fsck?ugi=hdfs&blockId=blk_-9223372036799576592+&path=%2F
> FSCK started by hdfs (auth:KERBEROS_SSL) from /10.12.19.4 at Wed Oct 19 
> 14:47:15 CST 2022Block Id: blk_-9223372036799576592
> Block belongs to: 
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
> No. of Expected Replica: 5
> No. of live Replica: 5
> No. of excess Replica: 0
> No. of stale Replica: 5
> No. of decommissioned Replica: 0
> No. of decommissioning Replica: 0
> No. of corrupted Replica: 0
> Block replica on datanode/rack: fs-hiido-dn-12-66-4.xx.com/4F08-01-09 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-65-244.xx.com/4F08-01-08 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-15-149.xx.com/4F08-05-13 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-65-218.xx.com/4F08-12-04 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-17-35.xx.com/4F08-03-03 is 
> HEALTHY
> hdfs fsck -fs hdfs://xxcluster06 
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
>  -files -blocks -locations
> Connecting to namenode via 
> http://xx.com:50070/fsck?ugi=hdfs&files=1&blocks=1&locations=1&path=%2Fhive_warehouse%2Fwarehouse_old_snapshots%2Fyy_mbsdkevent_original%2Fdt%3D20210505%2Fpost_202105052129_33.log.gz
> FSCK started by hdfs (auth:KERBEROS_SSL) from /10.12.19.4 for path 
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
>  at Wed Oct 19 14:48:42 CST 2022
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
>  500582412 bytes, erasure-coded: policy=RS-3-2-1024k, 1 block(s):  OK
> 0. BP-1822992414-10.12.65.48-1660893388633:blk_-9223372036799576592_4218617 
> len=500582412 Live_repl=5  
> [blk_-9223372036799576592:DatanodeInfoWithStorage[10.12.17.35:1019,DS-3ccebf8d-5f05-45b5-ac7f-96d1cfb48608,DISK],
>  
> blk_-9223372036799576591:DatanodeInfoWithStorage[10.12.65.218:1019,DS-4f8e3114-7566-4cf1-ad5a-e454c8ea8805,DISK],
>  
> blk_-9223372036799576590:DatanodeInfoWithStorage[10.12.15.149:1019,DS-1dd55c27-8f47-46a6-935b-1d9024ca9188,DISK],
>  
> blk_-9223372036799576589:DatanodeInfoWithStorage[10.12.65.244:1019,DS-a9ffd747-c427-4aaa-8559-04cded7d9d5f,DISK],
>  
> blk_-9223372036799576588:DatanodeInfoWithStorage[10.12.66.4:1019,DS-d88f94db-6db1-4753-a652-780d7cd7f081,DISK]]
> Status: HEALTHY
>  Number of data-nodes:  62
>  Number of racks:               19
>  Total dirs:                    0
>  Total symlinks:                0Replicated Blocks:
>  Total size:    0 B
>  Total files:   0
>  Total blocks (validated):      0
>  Minimally replicated blocks:   0
>  Over-replicated

[jira] [Commented] (HDFS-16803) Improve some annotations in hdfs module

2022-10-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620690#comment-17620690
 ] 

ASF GitHub Bot commented on HDFS-16803:
---

ZanderXu commented on PR #5031:
URL: https://github.com/apache/hadoop/pull/5031#issuecomment-1284845545

   Merged into trunk. Thanks @jianghuazhu for your contribution and thanks 
@ashutoshcipher @DaveTeng0 for your review.




> Improve some annotations in hdfs module
> ---
>
> Key: HDFS-16803
> URL: https://issues.apache.org/jira/browse/HDFS-16803
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, namenode
>Affects Versions: 2.9.2, 3.3.4
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
>
> In hdfs module, some annotations are out of date. E.g:
> {code:java}
>   FSDirRenameOp: 
>   /**
>* @see {@link #unprotectedRenameTo(FSDirectory, String, String, 
> INodesInPath,
>* INodesInPath, long, BlocksMapUpdateInfo, Options.Rename...)}
>*/
>   static RenameResult renameTo(FSDirectory fsd, FSPermissionChecker pc,
>   String src, String dst, BlocksMapUpdateInfo collectedBlocks,
>   boolean logRetryCache,Options.Rename... options)
>   throws IOException {
> {code}
> We should try to improve these annotations to make the documentation look 
> more comfortable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16803) Improve some annotations in hdfs module

2022-10-19 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu resolved HDFS-16803.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Improve some annotations in hdfs module
> ---
>
> Key: HDFS-16803
> URL: https://issues.apache.org/jira/browse/HDFS-16803
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, namenode
>Affects Versions: 2.9.2, 3.3.4
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> In hdfs module, some annotations are out of date. E.g:
> {code:java}
>   FSDirRenameOp: 
>   /**
>* @see {@link #unprotectedRenameTo(FSDirectory, String, String, 
> INodesInPath,
>* INodesInPath, long, BlocksMapUpdateInfo, Options.Rename...)}
>*/
>   static RenameResult renameTo(FSDirectory fsd, FSPermissionChecker pc,
>   String src, String dst, BlocksMapUpdateInfo collectedBlocks,
>   boolean logRetryCache,Options.Rename... options)
>   throws IOException {
> {code}
> We should try to improve these annotations to make the documentation look 
> more comfortable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16803) Improve some annotations in hdfs module

2022-10-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620689#comment-17620689
 ] 

ASF GitHub Bot commented on HDFS-16803:
---

ZanderXu merged PR #5031:
URL: https://github.com/apache/hadoop/pull/5031




> Improve some annotations in hdfs module
> ---
>
> Key: HDFS-16803
> URL: https://issues.apache.org/jira/browse/HDFS-16803
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, namenode
>Affects Versions: 2.9.2, 3.3.4
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
>
> In hdfs module, some annotations are out of date. E.g:
> {code:java}
>   FSDirRenameOp: 
>   /**
>* @see {@link #unprotectedRenameTo(FSDirectory, String, String, 
> INodesInPath,
>* INodesInPath, long, BlocksMapUpdateInfo, Options.Rename...)}
>*/
>   static RenameResult renameTo(FSDirectory fsd, FSPermissionChecker pc,
>   String src, String dst, BlocksMapUpdateInfo collectedBlocks,
>   boolean logRetryCache,Options.Rename... options)
>   throws IOException {
> {code}
> We should try to improve these annotations to make the documentation look 
> more comfortable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16803) Improve some annotations in hdfs module

2022-10-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620678#comment-17620678
 ] 

ASF GitHub Bot commented on HDFS-16803:
---

jianghuazhu commented on PR #5031:
URL: https://github.com/apache/hadoop/pull/5031#issuecomment-1284803556

   @ashutoshcipher , thank you for helping review this pr.
   Can you help with merging into trunk branch, @ZanderXu.  Thanks.




> Improve some annotations in hdfs module
> ---
>
> Key: HDFS-16803
> URL: https://issues.apache.org/jira/browse/HDFS-16803
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, namenode
>Affects Versions: 2.9.2, 3.3.4
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
>
> In hdfs module, some annotations are out of date. E.g:
> {code:java}
>   FSDirRenameOp: 
>   /**
>* @see {@link #unprotectedRenameTo(FSDirectory, String, String, 
> INodesInPath,
>* INodesInPath, long, BlocksMapUpdateInfo, Options.Rename...)}
>*/
>   static RenameResult renameTo(FSDirectory fsd, FSPermissionChecker pc,
>   String src, String dst, BlocksMapUpdateInfo collectedBlocks,
>   boolean logRetryCache,Options.Rename... options)
>   throws IOException {
> {code}
> We should try to improve these annotations to make the documentation look 
> more comfortable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-3570) Balancer shouldn't rely on "DFS Space Used %" as that ignores non-DFS used space

2022-10-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620568#comment-17620568
 ] 

ASF GitHub Bot commented on HDFS-3570:
--

hadoop-yetus commented on PR #5044:
URL: https://github.com/apache/hadoop/pull/5044#issuecomment-1284516102

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 56s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  42m  8s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 38s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 17s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 40s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 16s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 44s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m 59s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 28s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 28s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  0s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5044/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 21 unchanged - 
0 fixed = 22 total (was 21)  |
   | +1 :green_heart: |  mvnsite  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 25s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 33s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m 44s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 353m 14s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5044/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 56s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 471m 51s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestObserverNode |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5044/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5044 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint 
|
   | uname | Linux f6778e909231 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / e0e1a60554aa05ff878fc9685e6cb4b3ec01f618 |
   | Default Java | Private Build-1.8.0_342

[jira] [Commented] (HDFS-16771) JN should tersely print logs about NewerTxnIdException

2022-10-19 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620476#comment-17620476
 ] 

Erik Krogen commented on HDFS-16771:


Thanks for catching my mistake [~ferhui] !

> JN should tersely print logs about NewerTxnIdException
> --
>
> Key: HDFS-16771
> URL: https://issues.apache.org/jira/browse/HDFS-16771
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> JournalNode should tersely print some logs about NewerTxnIdException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16808) HDFS metrics will hold the previous value if there is no new call

2022-10-19 Thread leo sun (Jira)
leo sun created HDFS-16808:
--

 Summary: HDFS metrics will hold the previous value if there is no 
new call
 Key: HDFS-16808
 URL: https://issues.apache.org/jira/browse/HDFS-16808
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Reporter: leo sun
 Attachments: image-2022-10-19-23-59-19-673.png

According to the implementation of MutableStat.snapshot(), HDFS metrics will 
always hold the previous value if there is no more new call.

It will cause even if user switch active and standby, the previous ANN(standby 
now) will always output the old value as the pic shows

!image-2022-10-19-23-59-19-673.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-3570) Balancer shouldn't rely on "DFS Space Used %" as that ignores non-DFS used space

2022-10-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-3570:
-
Labels: pull-request-available  (was: )

> Balancer shouldn't rely on "DFS Space Used %" as that ignores non-DFS used 
> space
> 
>
> Key: HDFS-3570
> URL: https://issues.apache.org/jira/browse/HDFS-3570
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.0.0-alpha
>Reporter: Harsh J
>Assignee: Ashutosh Gupta
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HDFS-3570.003.patch, HDFS-3570.2.patch, 
> HDFS-3570.aash.1.patch
>
>
> Report from a user here: 
> https://groups.google.com/a/cloudera.org/d/msg/cdh-user/pIhNyDVxdVY/b7ENZmEvBjIJ,
>  post archived at http://pastebin.com/eVFkk0A0
> This user had a specific DN that had a large non-DFS usage among 
> dfs.data.dirs, and very little DFS usage (which is computed against total 
> possible capacity). 
> Balancer apparently only looks at the usage, and ignores to consider that 
> non-DFS usage may also be high on a DN/cluster. Hence, it thinks that if a 
> DFS Usage report from DN is 8% only, its got a lot of free space to write 
> more blocks, when that isn't true as shown by the case of this user. It went 
> on scheduling writes to the DN to balance it out, but the DN simply can't 
> accept any more blocks as a result of its disks' state.
> I think it would be better if we _computed_ the actual utilization based on 
> {{(100-(actual remaining space))/(capacity)}}, as opposed to the current 
> {{(dfs used)/(capacity)}}. Thoughts?
> This isn't very critical, however, cause it is very rare to see DN space 
> being used for non DN data, but it does expose a valid bug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-3570) Balancer shouldn't rely on "DFS Space Used %" as that ignores non-DFS used space

2022-10-19 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620309#comment-17620309
 ] 

ASF GitHub Bot commented on HDFS-3570:
--

ashutoshcipher opened a new pull request, #5044:
URL: https://github.com/apache/hadoop/pull/5044

   ### Description of PR
   
   **Balancer shouldn't rely on "DFS Space Used %" as that ignores non-DFS used 
space**
   
   
   Report from a user here: 
https://groups.google.com/a/cloudera.org/d/msg/cdh-user/pIhNyDVxdVY/b7ENZmEvBjIJ
 (Not available now) , post archived at http://pastebin.com/eVFkk0A0
   
   This user had a specific DN that had a large non-DFS usage among 
dfs.data.dirs, and very little DFS usage (which is computed against total 
possible capacity).
   
   Balancer apparently only looks at the usage, and ignores to consider that 
non-DFS usage may also be high on a DN/cluster. Hence, it thinks that if a DFS 
Usage report from DN is 8% only, its got a lot of free space to write more 
blocks, when that isn't true as shown by the case of this user. It went on 
scheduling writes to the DN to balance it out, but the DN simply can't accept 
any more blocks as a result of its disks' state.
   
   It would be better if we computed the actual utilization based on 
(100-(actual remaining space))/(capacity), as opposed to the current (dfs 
used)/(capacity). Thoughts?
   
   This isn't very critical, however, cause it is very rare to see DN space 
being used for non DN data, but it does expose a valid bug.
   
   
   
   ### How was this patch tested?
   
   UT
   
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> Balancer shouldn't rely on "DFS Space Used %" as that ignores non-DFS used 
> space
> 
>
> Key: HDFS-3570
> URL: https://issues.apache.org/jira/browse/HDFS-3570
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.0.0-alpha
>Reporter: Harsh J
>Assignee: Ashutosh Gupta
>Priority: Minor
> Attachments: HDFS-3570.003.patch, HDFS-3570.2.patch, 
> HDFS-3570.aash.1.patch
>
>
> Report from a user here: 
> https://groups.google.com/a/cloudera.org/d/msg/cdh-user/pIhNyDVxdVY/b7ENZmEvBjIJ,
>  post archived at http://pastebin.com/eVFkk0A0
> This user had a specific DN that had a large non-DFS usage among 
> dfs.data.dirs, and very little DFS usage (which is computed against total 
> possible capacity). 
> Balancer apparently only looks at the usage, and ignores to consider that 
> non-DFS usage may also be high on a DN/cluster. Hence, it thinks that if a 
> DFS Usage report from DN is 8% only, its got a lot of free space to write 
> more blocks, when that isn't true as shown by the case of this user. It went 
> on scheduling writes to the DN to balance it out, but the DN simply can't 
> accept any more blocks as a result of its disks' state.
> I think it would be better if we _computed_ the actual utilization based on 
> {{(100-(actual remaining space))/(capacity)}}, as opposed to the current 
> {{(dfs used)/(capacity)}}. Thoughts?
> This isn't very critical, however, cause it is very rare to see DN space 
> being used for non DN data, but it does expose a valid bug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-3570) Balancer shouldn't rely on "DFS Space Used %" as that ignores non-DFS used space

2022-10-19 Thread Ashutosh Gupta (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620302#comment-17620302
 ] 

Ashutosh Gupta commented on HDFS-3570:
--

I have gone through the discussion. Taking it for fix.

> Balancer shouldn't rely on "DFS Space Used %" as that ignores non-DFS used 
> space
> 
>
> Key: HDFS-3570
> URL: https://issues.apache.org/jira/browse/HDFS-3570
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer & mover
>Affects Versions: 2.0.0-alpha
>Reporter: Harsh J
>Assignee: Ashutosh Gupta
>Priority: Minor
> Attachments: HDFS-3570.003.patch, HDFS-3570.2.patch, 
> HDFS-3570.aash.1.patch
>
>
> Report from a user here: 
> https://groups.google.com/a/cloudera.org/d/msg/cdh-user/pIhNyDVxdVY/b7ENZmEvBjIJ,
>  post archived at http://pastebin.com/eVFkk0A0
> This user had a specific DN that had a large non-DFS usage among 
> dfs.data.dirs, and very little DFS usage (which is computed against total 
> possible capacity). 
> Balancer apparently only looks at the usage, and ignores to consider that 
> non-DFS usage may also be high on a DN/cluster. Hence, it thinks that if a 
> DFS Usage report from DN is 8% only, its got a lot of free space to write 
> more blocks, when that isn't true as shown by the case of this user. It went 
> on scheduling writes to the DN to balance it out, but the DN simply can't 
> accept any more blocks as a result of its disks' state.
> I think it would be better if we _computed_ the actual utilization based on 
> {{(100-(actual remaining space))/(capacity)}}, as opposed to the current 
> {{(dfs used)/(capacity)}}. Thoughts?
> This isn't very critical, however, cause it is very rare to see DN space 
> being used for non DN data, but it does expose a valid bug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16807) Improve legacy ClientProtocol#rename2() interface

2022-10-19 Thread JiangHua Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620206#comment-17620206
 ] 

JiangHua Zhu commented on HDFS-16807:
-

Can you guys post some suggestions? [~weichiu] [~aajisaka] [~hexiaoqiao] 
[~steve_l] [~ayushtkn].
Any suggestion is fine.


> Improve legacy ClientProtocol#rename2() interface
> -
>
> Key: HDFS-16807
> URL: https://issues.apache.org/jira/browse/HDFS-16807
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient
>Affects Versions: 3.3.3
>Reporter: JiangHua Zhu
>Priority: Major
>
> In HDFS-2298, rename2() replaced rename(), which is a very meaningful 
> improvement. It looks like some old customs are still preserved, they are:
> 1. When using the shell to execute the mv command, rename() is still used.
> ./bin/hdfs dfs -mv [source] [target]
> {code:java}
> In MoveCommands#Rename:
> protected void processPath(PathData src, PathData target) throws 
> IOException {
>   ..
>   if (!target.fs.rename(src.path, target.path)) {
> // we have no way to know the actual error...
> throw new PathIOException(src.toString());
>   }
> }
> {code}
> 2. When NNThroughputBenchmark verifies the rename.
> In NNThroughputBenchmark#RenameFileStats:
> {code:java}
> long executeOp(int daemonId, int inputIdx, String ignore)
> throws IOException {
>   long start = Time.now();
>   clientProto.rename(fileNames[daemonId][inputIdx],
>   destNames[daemonId][inputIdx]);
>   long end = Time.now();
>   return end-start;
> }
> {code}
> I think the interface should be kept uniform since rename() is deprecated. 
> For NNThroughputBenchmark, it's easy. But it is not easy to improve 
> MoveCommands, because it involves the transformation of FileSystem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16807) Improve legacy ClientProtocol#rename2() interface

2022-10-19 Thread JiangHua Zhu (Jira)
JiangHua Zhu created HDFS-16807:
---

 Summary: Improve legacy ClientProtocol#rename2() interface
 Key: HDFS-16807
 URL: https://issues.apache.org/jira/browse/HDFS-16807
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: dfsclient
Affects Versions: 3.3.3
Reporter: JiangHua Zhu


In HDFS-2298, rename2() replaced rename(), which is a very meaningful 
improvement. It looks like some old customs are still preserved, they are:
1. When using the shell to execute the mv command, rename() is still used.
./bin/hdfs dfs -mv [source] [target]
{code:java}
In MoveCommands#Rename:
protected void processPath(PathData src, PathData target) throws 
IOException {
  ..
  if (!target.fs.rename(src.path, target.path)) {
// we have no way to know the actual error...
throw new PathIOException(src.toString());
  }
}
{code}

2. When NNThroughputBenchmark verifies the rename.
In NNThroughputBenchmark#RenameFileStats:
{code:java}
long executeOp(int daemonId, int inputIdx, String ignore)
throws IOException {
  long start = Time.now();
  clientProto.rename(fileNames[daemonId][inputIdx],
  destNames[daemonId][inputIdx]);
  long end = Time.now();
  return end-start;
}
{code}

I think the interface should be kept uniform since rename() is deprecated. For 
NNThroughputBenchmark, it's easy. But it is not easy to improve MoveCommands, 
because it involves the transformation of FileSystem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16806) ec data balancer block blk_id The index error ,Data cannot be moved

2022-10-19 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620098#comment-17620098
 ] 

Takanobu Asanuma commented on HDFS-16806:
-

Thanks for reporting the issue, [~ruilaing].
 * You need to apply HDFS-16333 to the balancer client, and you don't need to 
apply it to NameNode. However, I'm not sure whether HDFS-16333 fixes this 
problem.
 * I think the priority of Blocker is too much for now. Changed the priority to 
Critical.

> ec data balancer block blk_id The index error ,Data cannot be moved
> ---
>
> Key: HDFS-16806
> URL: https://issues.apache.org/jira/browse/HDFS-16806
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: ruiliang
>Priority: Critical
>
> ec data balancer block blk_id The index error ,Data cannot be moved
> dn->10.12.15.149 use disk 100%
>  
> {code:java}
> echo 10.12.15.149>sorucehost
> balancer  -fs hdfs://xxcluster06  -threshold 10 -source -f sorucehost   
> 2>>~/balancer.log &  {code}
>  
> datanode logs 
> A lot of this log output  
> {code:java}
> datanode logs
> ...
> 2022-10-19 14:43:02,031 ERROR datanode.DataNode (DataXceiver.java:run(321)) - 
> fs-hiido-dn-12-15-149.xx.com:1019:DataXceiver error processing COPY_BLOCK 
> operation  src: /10.12.65.216:58214 dst: /10.12.15.149:1019
> org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not 
> found for 
> BP-1822992414-10.12.65.48-1660893388633:blk_-9223372036799576592_4218617
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:492)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:256)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.copyBlock(DataXceiver.java:1089)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opCopyBlock(Receiver.java:291)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:113)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:290)
>         at java.lang.Thread.run(Thread.java:748)
> ...    
>     
> hdfs fsck -fs hdfs://xxcluster06 -blockId blk_-9223372036799576592 
> Connecting to namenode via 
> http://fs-hiido-xxcluster06-yynn2.xx.com:50070/fsck?ugi=hdfs&blockId=blk_-9223372036799576592+&path=%2F
> FSCK started by hdfs (auth:KERBEROS_SSL) from /10.12.19.4 at Wed Oct 19 
> 14:47:15 CST 2022Block Id: blk_-9223372036799576592
> Block belongs to: 
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
> No. of Expected Replica: 5
> No. of live Replica: 5
> No. of excess Replica: 0
> No. of stale Replica: 5
> No. of decommissioned Replica: 0
> No. of decommissioning Replica: 0
> No. of corrupted Replica: 0
> Block replica on datanode/rack: fs-hiido-dn-12-66-4.xx.com/4F08-01-09 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-65-244.xx.com/4F08-01-08 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-15-149.xx.com/4F08-05-13 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-65-218.xx.com/4F08-12-04 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-17-35.xx.com/4F08-03-03 is 
> HEALTHY
> hdfs fsck -fs hdfs://xxcluster06 
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
>  -files -blocks -locations
> Connecting to namenode via 
> http://xx.com:50070/fsck?ugi=hdfs&files=1&blocks=1&locations=1&path=%2Fhive_warehouse%2Fwarehouse_old_snapshots%2Fyy_mbsdkevent_original%2Fdt%3D20210505%2Fpost_202105052129_33.log.gz
> FSCK started by hdfs (auth:KERBEROS_SSL) from /10.12.19.4 for path 
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
>  at Wed Oct 19 14:48:42 CST 2022
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
>  500582412 bytes, erasure-coded: policy=RS-3-2-1024k, 1 block(s):  OK
> 0. BP-1822992414-10.12.65.48-1660893388633:blk_-9223372036799576592_4218617 
> len=500582412 Live_repl=5  
> [blk_-9223372036799576592:DatanodeInfoWithStorage[10.12.17.35:1019,DS-3ccebf8d-5f05-45b5-ac7f-96d1cfb48608,DISK],
>  
> blk_-9223372036799576591:DatanodeInfoWithStorage[10.12.65.218:1019,DS-4f8e3114-7566-4cf1-ad5a-e454c8ea8805,DISK],
>  
> blk_-9223372036799576590:DatanodeInfoWithStorage[10.12.15.149:1019,DS-1dd55c27-8f47-46a6-935b-1d9024ca9188,DISK],
>  
> blk_-9223372036799576589:DatanodeInfoWithStorage[10.12.65.244:1019,DS-a9ffd747-c427-4aaa-8559-04cded7d9d5f,DISK],
>  
> blk_-9223372036799576588:DatanodeInfoWithStorage[10.12.66.4:1019,DS-d88f94db-6db1-4753-a652-780d7cd7f081,DISK]]
> Status: HEALTHY
>  Number of data-nodes:

[jira] [Updated] (HDFS-16806) ec data balancer block blk_id The index error ,Data cannot be moved

2022-10-19 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-16806:

Priority: Critical  (was: Blocker)

> ec data balancer block blk_id The index error ,Data cannot be moved
> ---
>
> Key: HDFS-16806
> URL: https://issues.apache.org/jira/browse/HDFS-16806
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: ruiliang
>Priority: Critical
>
> ec data balancer block blk_id The index error ,Data cannot be moved
> dn->10.12.15.149 use disk 100%
>  
> {code:java}
> echo 10.12.15.149>sorucehost
> balancer  -fs hdfs://xxcluster06  -threshold 10 -source -f sorucehost   
> 2>>~/balancer.log &  {code}
>  
> datanode logs 
> A lot of this log output  
> {code:java}
> datanode logs
> ...
> 2022-10-19 14:43:02,031 ERROR datanode.DataNode (DataXceiver.java:run(321)) - 
> fs-hiido-dn-12-15-149.xx.com:1019:DataXceiver error processing COPY_BLOCK 
> operation  src: /10.12.65.216:58214 dst: /10.12.15.149:1019
> org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not 
> found for 
> BP-1822992414-10.12.65.48-1660893388633:blk_-9223372036799576592_4218617
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:492)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:256)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.copyBlock(DataXceiver.java:1089)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opCopyBlock(Receiver.java:291)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:113)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:290)
>         at java.lang.Thread.run(Thread.java:748)
> ...    
>     
> hdfs fsck -fs hdfs://xxcluster06 -blockId blk_-9223372036799576592 
> Connecting to namenode via 
> http://fs-hiido-xxcluster06-yynn2.xx.com:50070/fsck?ugi=hdfs&blockId=blk_-9223372036799576592+&path=%2F
> FSCK started by hdfs (auth:KERBEROS_SSL) from /10.12.19.4 at Wed Oct 19 
> 14:47:15 CST 2022Block Id: blk_-9223372036799576592
> Block belongs to: 
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
> No. of Expected Replica: 5
> No. of live Replica: 5
> No. of excess Replica: 0
> No. of stale Replica: 5
> No. of decommissioned Replica: 0
> No. of decommissioning Replica: 0
> No. of corrupted Replica: 0
> Block replica on datanode/rack: fs-hiido-dn-12-66-4.xx.com/4F08-01-09 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-65-244.xx.com/4F08-01-08 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-15-149.xx.com/4F08-05-13 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-65-218.xx.com/4F08-12-04 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-17-35.xx.com/4F08-03-03 is 
> HEALTHY
> hdfs fsck -fs hdfs://xxcluster06 
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
>  -files -blocks -locations
> Connecting to namenode via 
> http://xx.com:50070/fsck?ugi=hdfs&files=1&blocks=1&locations=1&path=%2Fhive_warehouse%2Fwarehouse_old_snapshots%2Fyy_mbsdkevent_original%2Fdt%3D20210505%2Fpost_202105052129_33.log.gz
> FSCK started by hdfs (auth:KERBEROS_SSL) from /10.12.19.4 for path 
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
>  at Wed Oct 19 14:48:42 CST 2022
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
>  500582412 bytes, erasure-coded: policy=RS-3-2-1024k, 1 block(s):  OK
> 0. BP-1822992414-10.12.65.48-1660893388633:blk_-9223372036799576592_4218617 
> len=500582412 Live_repl=5  
> [blk_-9223372036799576592:DatanodeInfoWithStorage[10.12.17.35:1019,DS-3ccebf8d-5f05-45b5-ac7f-96d1cfb48608,DISK],
>  
> blk_-9223372036799576591:DatanodeInfoWithStorage[10.12.65.218:1019,DS-4f8e3114-7566-4cf1-ad5a-e454c8ea8805,DISK],
>  
> blk_-9223372036799576590:DatanodeInfoWithStorage[10.12.15.149:1019,DS-1dd55c27-8f47-46a6-935b-1d9024ca9188,DISK],
>  
> blk_-9223372036799576589:DatanodeInfoWithStorage[10.12.65.244:1019,DS-a9ffd747-c427-4aaa-8559-04cded7d9d5f,DISK],
>  
> blk_-9223372036799576588:DatanodeInfoWithStorage[10.12.66.4:1019,DS-d88f94db-6db1-4753-a652-780d7cd7f081,DISK]]
> Status: HEALTHY
>  Number of data-nodes:  62
>  Number of racks:               19
>  Total dirs:                    0
>  Total symlinks:                0Replicated Blocks:
>  Total size:    0 B
>  Total files:   0
>  Total blocks (validated):      0
>  Minimally replicated blocks:   0
>  Over-replicated blocks:        0
>  Under-replicated blocks:       0

[jira] [Commented] (HDFS-16806) ec data balancer block blk_id The index error ,Data cannot be moved

2022-10-19 Thread ruiliang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620065#comment-17620065
 ] 

ruiliang commented on HDFS-16806:
-

https://issues.apache.org/jira/browse/HDFS-16333

Is that the question?

All I have to do is join the balancer client, right?

Or pull it to the namenode server

> ec data balancer block blk_id The index error ,Data cannot be moved
> ---
>
> Key: HDFS-16806
> URL: https://issues.apache.org/jira/browse/HDFS-16806
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: ruiliang
>Priority: Blocker
>
> ec data balancer block blk_id The index error ,Data cannot be moved
> dn->10.12.15.149 use disk 100%
>  
> {code:java}
> echo 10.12.15.149>sorucehost
> balancer  -fs hdfs://xxcluster06  -threshold 10 -source -f sorucehost   
> 2>>~/balancer.log &  {code}
>  
> datanode logs 
> A lot of this log output  
> {code:java}
> datanode logs
> ...
> 2022-10-19 14:43:02,031 ERROR datanode.DataNode (DataXceiver.java:run(321)) - 
> fs-hiido-dn-12-15-149.xx.com:1019:DataXceiver error processing COPY_BLOCK 
> operation  src: /10.12.65.216:58214 dst: /10.12.15.149:1019
> org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not 
> found for 
> BP-1822992414-10.12.65.48-1660893388633:blk_-9223372036799576592_4218617
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:492)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:256)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.copyBlock(DataXceiver.java:1089)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opCopyBlock(Receiver.java:291)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:113)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:290)
>         at java.lang.Thread.run(Thread.java:748)
> ...    
>     
> hdfs fsck -fs hdfs://xxcluster06 -blockId blk_-9223372036799576592 
> Connecting to namenode via 
> http://fs-hiido-xxcluster06-yynn2.xx.com:50070/fsck?ugi=hdfs&blockId=blk_-9223372036799576592+&path=%2F
> FSCK started by hdfs (auth:KERBEROS_SSL) from /10.12.19.4 at Wed Oct 19 
> 14:47:15 CST 2022Block Id: blk_-9223372036799576592
> Block belongs to: 
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
> No. of Expected Replica: 5
> No. of live Replica: 5
> No. of excess Replica: 0
> No. of stale Replica: 5
> No. of decommissioned Replica: 0
> No. of decommissioning Replica: 0
> No. of corrupted Replica: 0
> Block replica on datanode/rack: fs-hiido-dn-12-66-4.xx.com/4F08-01-09 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-65-244.xx.com/4F08-01-08 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-15-149.xx.com/4F08-05-13 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-65-218.xx.com/4F08-12-04 is 
> HEALTHY
> Block replica on datanode/rack: fs-hiido-dn-12-17-35.xx.com/4F08-03-03 is 
> HEALTHY
> hdfs fsck -fs hdfs://xxcluster06 
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
>  -files -blocks -locations
> Connecting to namenode via 
> http://xx.com:50070/fsck?ugi=hdfs&files=1&blocks=1&locations=1&path=%2Fhive_warehouse%2Fwarehouse_old_snapshots%2Fyy_mbsdkevent_original%2Fdt%3D20210505%2Fpost_202105052129_33.log.gz
> FSCK started by hdfs (auth:KERBEROS_SSL) from /10.12.19.4 for path 
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
>  at Wed Oct 19 14:48:42 CST 2022
> /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
>  500582412 bytes, erasure-coded: policy=RS-3-2-1024k, 1 block(s):  OK
> 0. BP-1822992414-10.12.65.48-1660893388633:blk_-9223372036799576592_4218617 
> len=500582412 Live_repl=5  
> [blk_-9223372036799576592:DatanodeInfoWithStorage[10.12.17.35:1019,DS-3ccebf8d-5f05-45b5-ac7f-96d1cfb48608,DISK],
>  
> blk_-9223372036799576591:DatanodeInfoWithStorage[10.12.65.218:1019,DS-4f8e3114-7566-4cf1-ad5a-e454c8ea8805,DISK],
>  
> blk_-9223372036799576590:DatanodeInfoWithStorage[10.12.15.149:1019,DS-1dd55c27-8f47-46a6-935b-1d9024ca9188,DISK],
>  
> blk_-9223372036799576589:DatanodeInfoWithStorage[10.12.65.244:1019,DS-a9ffd747-c427-4aaa-8559-04cded7d9d5f,DISK],
>  
> blk_-9223372036799576588:DatanodeInfoWithStorage[10.12.66.4:1019,DS-d88f94db-6db1-4753-a652-780d7cd7f081,DISK]]
> Status: HEALTHY
>  Number of data-nodes:  62
>  Number of racks:               19
>  Total dirs:                    0
>  Total symlinks:                0Replicated Blocks:
>  Total size:    0 B
>

[jira] [Updated] (HDFS-16806) ec data balancer block blk_id The index error ,Data cannot be moved

2022-10-19 Thread ruiliang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ruiliang updated HDFS-16806:

Description: 
ec data balancer block blk_id The index error ,Data cannot be moved

dn->10.12.15.149 use disk 100%

 
{code:java}
echo 10.12.15.149>sorucehost
balancer  -fs hdfs://xxcluster06  -threshold 10 -source -f sorucehost   
2>>~/balancer.log &  {code}
 

datanode logs 
A lot of this log output  
{code:java}
datanode logs
...
2022-10-19 14:43:02,031 ERROR datanode.DataNode (DataXceiver.java:run(321)) - 
fs-hiido-dn-12-15-149.xx.com:1019:DataXceiver error processing COPY_BLOCK 
operation  src: /10.12.65.216:58214 dst: /10.12.15.149:1019
org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not 
found for 
BP-1822992414-10.12.65.48-1660893388633:blk_-9223372036799576592_4218617
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:492)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:256)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.copyBlock(DataXceiver.java:1089)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opCopyBlock(Receiver.java:291)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:113)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:290)
        at java.lang.Thread.run(Thread.java:748)
...    
    
hdfs fsck -fs hdfs://xxcluster06 -blockId blk_-9223372036799576592 
Connecting to namenode via 
http://fs-hiido-xxcluster06-yynn2.xx.com:50070/fsck?ugi=hdfs&blockId=blk_-9223372036799576592+&path=%2F
FSCK started by hdfs (auth:KERBEROS_SSL) from /10.12.19.4 at Wed Oct 19 
14:47:15 CST 2022Block Id: blk_-9223372036799576592
Block belongs to: 
/hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
No. of Expected Replica: 5
No. of live Replica: 5
No. of excess Replica: 0
No. of stale Replica: 5
No. of decommissioned Replica: 0
No. of decommissioning Replica: 0
No. of corrupted Replica: 0
Block replica on datanode/rack: fs-hiido-dn-12-66-4.xx.com/4F08-01-09 is HEALTHY
Block replica on datanode/rack: fs-hiido-dn-12-65-244.xx.com/4F08-01-08 is 
HEALTHY
Block replica on datanode/rack: fs-hiido-dn-12-15-149.xx.com/4F08-05-13 is 
HEALTHY
Block replica on datanode/rack: fs-hiido-dn-12-65-218.xx.com/4F08-12-04 is 
HEALTHY
Block replica on datanode/rack: fs-hiido-dn-12-17-35.xx.com/4F08-03-03 is 
HEALTHY



hdfs fsck -fs hdfs://xxcluster06 
/hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
 -files -blocks -locations
Connecting to namenode via 
http://xx.com:50070/fsck?ugi=hdfs&files=1&blocks=1&locations=1&path=%2Fhive_warehouse%2Fwarehouse_old_snapshots%2Fyy_mbsdkevent_original%2Fdt%3D20210505%2Fpost_202105052129_33.log.gz
FSCK started by hdfs (auth:KERBEROS_SSL) from /10.12.19.4 for path 
/hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
 at Wed Oct 19 14:48:42 CST 2022
/hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
 500582412 bytes, erasure-coded: policy=RS-3-2-1024k, 1 block(s):  OK
0. BP-1822992414-10.12.65.48-1660893388633:blk_-9223372036799576592_4218617 
len=500582412 Live_repl=5  
[blk_-9223372036799576592:DatanodeInfoWithStorage[10.12.17.35:1019,DS-3ccebf8d-5f05-45b5-ac7f-96d1cfb48608,DISK],
 
blk_-9223372036799576591:DatanodeInfoWithStorage[10.12.65.218:1019,DS-4f8e3114-7566-4cf1-ad5a-e454c8ea8805,DISK],
 
blk_-9223372036799576590:DatanodeInfoWithStorage[10.12.15.149:1019,DS-1dd55c27-8f47-46a6-935b-1d9024ca9188,DISK],
 
blk_-9223372036799576589:DatanodeInfoWithStorage[10.12.65.244:1019,DS-a9ffd747-c427-4aaa-8559-04cded7d9d5f,DISK],
 
blk_-9223372036799576588:DatanodeInfoWithStorage[10.12.66.4:1019,DS-d88f94db-6db1-4753-a652-780d7cd7f081,DISK]]
Status: HEALTHY
 Number of data-nodes:  62
 Number of racks:               19
 Total dirs:                    0
 Total symlinks:                0Replicated Blocks:
 Total size:    0 B
 Total files:   0
 Total blocks (validated):      0
 Minimally replicated blocks:   0
 Over-replicated blocks:        0
 Under-replicated blocks:       0
 Mis-replicated blocks:         0
 Default replication factor:    3
 Average block replication:     0.0
 Missing blocks:                0
 Corrupt blocks:                0
 Missing replicas:              0Erasure Coded Block Groups:
 Total size:    500582412 B
 Total files:   1
 Total block groups (validated):        1 (avg. block group size 500582412 B)
 Minimally erasure-coded block groups:  1 (100.0 %)
 Over-erasure-coded block groups:       0 (0.0 %)
 Under-erasure-coded block groups:      0 (0.0 %)
 Unsatisfactory placement block groups: 0 (0.0 %)
 Average block group size:      5.0
 Missing block groups:          0
 Corrupt block gr

[jira] [Created] (HDFS-16806) ec data balancer block blk_id The index error ,Data cannot be moved

2022-10-19 Thread ruiliang (Jira)
ruiliang created HDFS-16806:
---

 Summary: ec data balancer block blk_id The index error ,Data 
cannot be moved
 Key: HDFS-16806
 URL: https://issues.apache.org/jira/browse/HDFS-16806
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 3.1.0
Reporter: ruiliang


ec data balancer block blk_id The index error ,Data cannot be moved

dn->10.12.15.149 use disk 100%
{code:java}
echo 10.12.15.149>sorucehost
balancer  -fs hdfs://xxcluster06  -threshold 10 -source -f sorucehost   
2>>~/balancer.log & 
 {code}
{code:java}
datanode logs
...
2022-10-19 14:43:02,031 ERROR datanode.DataNode (DataXceiver.java:run(321)) - 
fs-hiido-dn-12-15-149.xx.com:1019:DataXceiver error processing COPY_BLOCK 
operation  src: /10.12.65.216:58214 dst: /10.12.15.149:1019
org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not 
found for 
BP-1822992414-10.12.65.48-1660893388633:blk_-9223372036799576592_4218617
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:492)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:256)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.copyBlock(DataXceiver.java:1089)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opCopyBlock(Receiver.java:291)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:113)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:290)
        at java.lang.Thread.run(Thread.java:748)
...        
hdfs fsck -fs hdfs://xxcluster06 -blockId blk_-9223372036799576592 
Connecting to namenode via 
http://fs-hiido-xxcluster06-yynn2.xx.com:50070/fsck?ugi=hdfs&blockId=blk_-9223372036799576592+&path=%2F
FSCK started by hdfs (auth:KERBEROS_SSL) from /10.12.19.4 at Wed Oct 19 
14:47:15 CST 2022Block Id: blk_-9223372036799576592
Block belongs to: 
/hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
No. of Expected Replica: 5
No. of live Replica: 5
No. of excess Replica: 0
No. of stale Replica: 5
No. of decommissioned Replica: 0
No. of decommissioning Replica: 0
No. of corrupted Replica: 0
Block replica on datanode/rack: fs-hiido-dn-12-66-4.xx.com/4F08-01-09 is HEALTHY
Block replica on datanode/rack: fs-hiido-dn-12-65-244.xx.com/4F08-01-08 is 
HEALTHY
Block replica on datanode/rack: fs-hiido-dn-12-15-149.xx.com/4F08-05-13 is 
HEALTHY
Block replica on datanode/rack: fs-hiido-dn-12-65-218.xx.com/4F08-12-04 is 
HEALTHY
Block replica on datanode/rack: fs-hiido-dn-12-17-35.xx.com/4F08-03-03 is 
HEALTHYhdfs fsck -fs hdfs://xxcluster06 
/hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
 -files -blocks -locations
Connecting to namenode via 
http://xx.com:50070/fsck?ugi=hdfs&files=1&blocks=1&locations=1&path=%2Fhive_warehouse%2Fwarehouse_old_snapshots%2Fyy_mbsdkevent_original%2Fdt%3D20210505%2Fpost_202105052129_33.log.gz
FSCK started by hdfs (auth:KERBEROS_SSL) from /10.12.19.4 for path 
/hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
 at Wed Oct 19 14:48:42 CST 2022
/hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz
 500582412 bytes, erasure-coded: policy=RS-3-2-1024k, 1 block(s):  OK
0. BP-1822992414-10.12.65.48-1660893388633:blk_-9223372036799576592_4218617 
len=500582412 Live_repl=5  
[blk_-9223372036799576592:DatanodeInfoWithStorage[10.12.17.35:1019,DS-3ccebf8d-5f05-45b5-ac7f-96d1cfb48608,DISK],
 
blk_-9223372036799576591:DatanodeInfoWithStorage[10.12.65.218:1019,DS-4f8e3114-7566-4cf1-ad5a-e454c8ea8805,DISK],
 
blk_-9223372036799576590:DatanodeInfoWithStorage[10.12.15.149:1019,DS-1dd55c27-8f47-46a6-935b-1d9024ca9188,DISK],
 
blk_-9223372036799576589:DatanodeInfoWithStorage[10.12.65.244:1019,DS-a9ffd747-c427-4aaa-8559-04cded7d9d5f,DISK],
 
blk_-9223372036799576588:DatanodeInfoWithStorage[10.12.66.4:1019,DS-d88f94db-6db1-4753-a652-780d7cd7f081,DISK]]
Status: HEALTHY
 Number of data-nodes:  62
 Number of racks:               19
 Total dirs:                    0
 Total symlinks:                0Replicated Blocks:
 Total size:    0 B
 Total files:   0
 Total blocks (validated):      0
 Minimally replicated blocks:   0
 Over-replicated blocks:        0
 Under-replicated blocks:       0
 Mis-replicated blocks:         0
 Default replication factor:    3
 Average block replication:     0.0
 Missing blocks:                0
 Corrupt blocks:                0
 Missing replicas:              0Erasure Coded Block Groups:
 Total size:    500582412 B
 Total files:   1
 Total block groups (validated):        1 (avg. block group size 500582412 B)
 Minimally erasure-coded block groups:  1 (100.0 %)
 Over-erasure-coded block groups:       0 (0.0 %)
 Under-erasure-coded block groups: