[ https://issues.apache.org/jira/browse/HDFS-13915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jiandan Yang updated HDFS-13915: --------------------------------- Attachment: HDFS-13915.003.patch > replace datanode failed because of NameNodeRpcServer#getAdditionalDatanode > returning excessive datanodeInfo > ------------------------------------------------------------------------------------------------------------ > > Key: HDFS-13915 > URL: https://issues.apache.org/jira/browse/HDFS-13915 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Environment: > Reporter: Jiandan Yang > Assignee: Jiandan Yang > Priority: Major > Attachments: HDFS-13915.001.patch, HDFS-13915.002.patch, > HDFS-13915.003.patch > > > Consider following situation: > 1. create a file with ALLSSD policy > 2. return [SSD,SSD,DISK] due to lack of SSD space > 3. client call NameNodeRpcServer#getAdditionalDatanode when recovering write > pipeline and replacing bad datanode > 4. BlockPlacementPolicyDefault#chooseTarget will call > StoragePolicy#chooseStorageTypes(3, [SSD,DISK], none, false), but > chooseStorageTypes return [SSD,SSD] > {code:java} > @Test > public void testAllSSDFallbackAndNonNewBlock() { > final BlockStoragePolicy allSSD = POLICY_SUITE.getPolicy(ALLSSD); > List<StorageType> storageTypes = allSSD.chooseStorageTypes((short) 3, > Arrays.asList(StorageType.DISK, StorageType.SSD), > EnumSet.noneOf(StorageType.class), false); > assertEquals(2, storageTypes.size()); > assertEquals(StorageType.SSD, storageTypes.get(0)); > assertEquals(StorageType.SSD, storageTypes.get(1)); > } > {code} > 5. do numOfReplicas = requiredStorageTypes.size() and numOfReplicas is set to > 2 and choose additional two datanodes > 6. BlockPlacementPolicyDefault#chooseTarget return four datanodes to client > 7. DataStreamer#findNewDatanode find nodes.length != original.length + 1 and > throw IOException, and finally lead to write failed > {code:java} > private int findNewDatanode(final DatanodeInfo[] original > ) throws IOException { > if (nodes.length != original.length + 1) { > throw new IOException( > "Failed to replace a bad datanode on the existing pipeline " > + "due to no more good datanodes being available to try. " > + "(Nodes: current=" + Arrays.asList(nodes) > + ", original=" + Arrays.asList(original) + "). " > + "The current failed datanode replacement policy is " > + dfsClient.dtpReplaceDatanodeOnFailure > + ", and a client may configure this via '" > + BlockWrite.ReplaceDatanodeOnFailure.POLICY_KEY > + "' in its configuration."); > } > for(int i = 0; i < nodes.length; i++) { > int j = 0; > for(; j < original.length && !nodes[i].equals(original[j]); j++); > if (j == original.length) { > return i; > } > } > throw new IOException("Failed: new datanode not found: nodes=" > + Arrays.asList(nodes) + ", original=" + Arrays.asList(original)); > } > {code} > client warn logs is: > {code:java} > WARN [DataStreamer for file > /home/yarn/opensearch/in/data/120141286/0_65535/table/ucs_process/MANIFEST-093545 > block BP-1742758844-11.138.8.184-1483707043031:blk_7086344902_6012765313] > org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception > java.io.IOException: Failed to replace a bad datanode on the existing > pipeline due to no more good datanodes being available to try. (Nodes: > current=[DatanodeInfoWithStorage[11.138.5.4:50010,DS-04826cfc-1885-4213-a58b-8606845c5c42,SSD], > > DatanodeInfoWithStorage[11.138.5.9:50010,DS-f6d8eb8b-2550-474b-a692-c991d7a6f6b3,SSD], > > DatanodeInfoWithStorage[11.138.5.153:50010,DS-f5d77ca0-6fe3-4523-8ca8-5af975f845b6,SSD], > > DatanodeInfoWithStorage[11.138.9.156:50010,DS-0d15ea12-1bad-4444-84f7-1a4917a1e194,DISK]], > > original=[DatanodeInfoWithStorage[11.138.5.4:50010,DS-04826cfc-1885-4213-a58b-8606845c5c42,SSD], > > DatanodeInfoWithStorage[11.138.9.156:50010,DS-0d15ea12-1bad-4444-84f7-1a4917a1e194,DISK]]). > The current failed datanode replacement policy is DEFAULT, and a client may > configure this via > 'dfs.client.block.write.replace-datanode-on-failure.policy' in its > configuration. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org