[
https://issues.apache.org/jira/browse/HDFS-16739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18083388#comment-18083388
]
ASF GitHub Bot commented on HDFS-16739:
---------------------------------------
nauyzz opened a new pull request, #8515:
URL: https://github.com/apache/hadoop/pull/8515
…Policy
<!--
Thanks for sending a pull request!
1. If this is your first time, please read our contributor guidelines:
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
2. Make sure your PR title starts with JIRA issue id, e.g.,
'HADOOP-17799. Your PR title ...'.
-->
### Description of PR
As mentioned in
[HDFS-16739](https://issues.apache.org/jira/browse/HDFS-16739), in order to
satisfy the storage policy, the length ofchosenTargets may be more than the
actual number of blocks that need to be reconstructed. We truncated the
chosenTargets array to ensure the reconstruction task could run properly.
### How was this patch tested?
It has been running in production for over a year.
### For code changes:
- [ ] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'HADOOP-17799. Your PR title ...')?
- [ ] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
### AI Tooling
If an AI tool was used:
- [ ] The PR includes the phrase "Contains content generated by <tool>"
where <tool> is the name of the AI tool used.
- [ ] My use of AI contributions follows the ASF legal policy
https://www.apache.org/legal/generative-tooling.html
> EC: Reconstruction failed when file has specified StoragePolicy
> ---------------------------------------------------------------
>
> Key: HDFS-16739
> URL: https://issues.apache.org/jira/browse/HDFS-16739
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.1.3
> Reporter: MingHui Luo
> Priority: Major
> Fix For: 3.1.3
>
>
> We found that due to BlockReconstructionWork use the same chooseTarget
> function with Redundancy Block, so the targe returned is more than real
> additionalReplRequired due to need to satisfy storage policy. So , it causes
> all kind of exception when DN do ECReconstructionWork.
> One of Exception in DN as follows:
> {code:java}
> 2022-08-24 03:01:39,534 WARN [Command processor]
> org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to reconstruct
> striped block blk_-9223372032283192848_35319673088
> java.lang.IllegalArgumentException: Too much missed striped blocks.
> at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:141)
> at
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedWriter.<init>(StripedWriter.java:87)
> at
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.<init>(StripedBlockReconstructor.java:45)
> at
> org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker.processErasureCodingTasks(ErasureCodingWorker.java:134)
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:797)
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:680)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1306)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1344)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1280)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1267)
> {code}
> this file ec policy is RS-6-3-1024k, here is inner block info,
> blk_-9223372032283192845 (index:3) need to reconstruct , and all Storage is
> DISK ,but the file's storage policy is ALL_SSD
> {code:java}
> [blk_-9223372032283192848:DatanodeInfoWithStorage[10.x.x.33:50010,DS-e1435341-f43c-42ef-806f-90fsddfsfdcd,DISK],
>
> blk_-9223372032283192847:DatanodeInfoWithStorage[10.x.x.35:50010,DS-a6dsd16a-676a-4fed-8ffe-fsdfscw23445,DISK],
>
> blk_-9223372032283192846:DatanodeInfoWithStorage[10.x.x.34:50010,DS-40cdc124-e2e0-40f6-aa47-4d2bdsf3e8e5,DISK],
>
> blk_-9223372032283192844:DatanodeInfoWithStorage[10.x.x.21:50010,DS-ef9dee4f-dfb2-495c-872a-974dfscds58e,DISK],
>
> blk_-9223372032283192843:DatanodeInfoWithStorage[10.x.x.40:50010,DS-6dsedfa7-8291-46bb-964d-dfsf34567655,DISK],
>
> blk_-9223372032283192842:DatanodeInfoWithStorage[10.x.x.36:50010,DS-2dddc387-c38b-427d-9925-15a664d3472b,DISK],
>
> blk_-9223372032283192841:DatanodeInfoWithStorage[10.x.x.151:50010,DS-fds91a7-89ad-4899-bc44-675dfs32f58e,DISK],
>
> blk_-9223372032283192840:DatanodeInfoWithStorage[10.x.x.27:50010,DS-77dfs4c1-c23c-4b26-baa3-aadsfdff4118,DISK]]
> {code}
> here is BlockECReconstructionInfo, due to all inner block is not satisfied
> with storage policy(ALL_SSD) , so the target length is 9 rather than 1.
> {code:java}
> 2022-08-24 03:01:39,534 INFO [Command processor]
> org.apache.hadoop.hdfs.server.datanode.DataNode: processErasureCodingTasks
> BlockECReconstructionInfo(
> Recovering
> BP-390041874-10.x.x.x-1550651014658:blk_-9223372032283192848_35319673088
> From: [10.x.x.33:50010, 10.x.x.35:50010, 10.x.x.34:50010, 10.x.x.21:50010,
> 10.x.x.40:50010, 10.x.x.36:50010, 10.x.x.151:50010, 10.x.x.27:50010] To:
> [[10.x.x.37:50010, 10.x.x.21:50010, 10.x.x.32:50010, 10.x.x.27:50010,
> 10.x.x.28:50010, 10.x.x.23:50010, 10.x.x.23:50010, 10.x.x.101:50010,
> 10.x.x.32:50010])
> Block Indices: [0, 1, 2, 4, 5, 6, 7, 8] {code}
> when init stripedWriter in DN StripedBlockReconstructor, need to judge
> targetIndicies.length<=prityBlkNum (9<=3) . so, this striped blocks will
> never reconstruct successfully.
> {code:java}
> targetIndices = new short[targets.length];
> Preconditions.checkArgument(targetIndices.length <= parityBlkNum,
> "Too much missed striped blocks."); {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]