[ https://issues.apache.org/jira/browse/HDFS-16182?focusedWorklogId=640600&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-640600 ]
ASF GitHub Bot logged work on HDFS-16182: ----------------------------------------- Author: ASF GitHub Bot Created on: 23/Aug/21 07:30 Start Date: 23/Aug/21 07:30 Worklog Time Spent: 10m Work Description: Neilxzn opened a new pull request #3320: URL: https://github.com/apache/hadoop/pull/3320 <!-- Thanks for sending a pull request! 1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute 2. Make sure your PR title starts with JIRA issue id, e.g., 'HADOOP-17799. Your PR title ...'. --> ### Description of PR https://issues.apache.org/jira/browse/HDFS-16182 ### How was this patch tested? add TestBlockStoragePolicy.testAddDatanode2ExistingPipelineInSsd ### For code changes: - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 640600) Remaining Estimate: 0h Time Spent: 10m > numOfReplicas is given the wrong value in > BlockPlacementPolicyDefault$chooseTarget can cause DataStreamer to fail with > Heterogeneous Storage > ----------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-16182 > URL: https://issues.apache.org/jira/browse/HDFS-16182 > Project: Hadoop HDFS > Issue Type: Bug > Components: namanode > Affects Versions: 3.4.0 > Reporter: Max Xie > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > In our hdfs cluster, we use heterogeneous storage to store data in SSD for a > better performance. Sometimes hdfs client transfer data in pipline, it will > throw IOException and exit. Exception logs are below: > ``` > java.io.IOException: Failed to replace a bad datanode on the existing > pipeline due to no more good datanodes being available to try. (Nodes: > current=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK], > > DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK], > > DatanodeInfoWithStorage[dn03_ip:5004,DS-a388c067-76a4-4014-a16c-ccc49c8da77b,SSD], > > DatanodeInfoWithStorage[dn04_ip:5004,DS-b81da262-0dd9-4567-a498-c516fab84fe0,SSD], > > DatanodeInfoWithStorage[dn05_ip:5004,DS-34e3af2e-da80-46ac-938c-6a3218a646b9,SSD]], > > original=[DatanodeInfoWithStorage[dn01_ip:5004,DS-ef7882e0-427d-4c1e-b9ba-a929fac44fb4,DISK], > > DatanodeInfoWithStorage[dn02_ip:5004,DS-3871282a-ad45-4332-866a-f000f9361ecb,DISK]]). > The current failed datanode replacement policy is DEFAULT, and a client may > configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > ``` > After search it, I found when existing pipline need replace new dn to > transfer data, the client will get one additional dn from namenode and check > that the number of dn is the original number + 1. > ``` > ## DataStreamer$findNewDatanode > if (nodes.length != original.length + 1) { > throw new IOException( > "Failed to replace a bad datanode on the existing pipeline " > + "due to no more good datanodes being available to try. " > + "(Nodes: current=" + Arrays.asList(nodes) > + ", original=" + Arrays.asList(original) + "). " > + "The current failed datanode replacement policy is " > + dfsClient.dtpReplaceDatanodeOnFailure > + ", and a client may configure this via '" > + BlockWrite.ReplaceDatanodeOnFailure.POLICY_KEY > + "' in its configuration."); > } > ``` > The root cause is that Namenode$getAdditionalDatanode returns multi datanodes > , not one in DataStreamer.addDatanode2ExistingPipeline. > > Maybe we can fix it in BlockPlacementPolicyDefault$chooseTarget. I think > numOfReplicas should not be assigned by requiredStorageTypes. > > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org