[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14713409#comment-14713409 ]
Yi Liu commented on HDFS-8946: ------------------------------ # Add {{chooseStorage4Block}} to choose a good storage of given type from the datanode. It will checks the state of storage, and whether there is enough space to place the block on the datanode. These conditions are the same as original. Since the datanode only cares about the storage type when placing a block, this method will return the first storage of given type we see. Then {{isGoodTarget}} can be removed, since we check all the conditions in {{chooseStorage4Block}}. # No need to shuffle the storages of datanode and no need to iterate all the storages of datanode in {{chooseLocalStorage}} and {{chooseRandom}}, since we can choose the storage of given type using {{chooseStorage4Block}} once. # {{addToExcludedNodes}} is redundant after finding a good storage, since we already add the data node to excludedNodes at the begging of checking the node. # {{numOfAvailableNodes \-= newExcludedNodes;}} in {{chooseRandom}} is redundant, since we already do {{numOfAvailableNodes--}} at the begging of checking the node. # In {{DatanodeDescriptor#chooseStorage4Block}}, {{t == null}} is unnecessary, since it never be null and it's added in HDFS-8863. > Improve choosing datanode storage for block placement > ----------------------------------------------------- > > Key: HDFS-8946 > URL: https://issues.apache.org/jira/browse/HDFS-8946 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Reporter: Yi Liu > Assignee: Yi Liu > Attachments: HDFS-8946.001.patch > > > This JIRA is to: > *1.* Improve chooseing datanode storage for block placement: > In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, > {{chooseRandom}}), we have following logic to choose datanode storage to > place block. > For given storage type, we iterate storages of the datanode. But for > datanode, it only cares about the storage type. In the loop, we check > according to Storage type and return the first storage if the storages of the > type on the datanode fit in requirement. So we can remove the iteration of > storages, and just need to do once to find a good storage of given type, it's > efficient if the storages of the type on the datanode don't fit in > requirement since we don't need to loop all storages and do the same check. > Besides, no need to shuffle the storages, since we only need to check > according to the storage type on the datanode once. > This also improves the logic and make it more clear. > {code} > if (excludedNodes.add(localMachine) // was not in the excluded list > && isGoodDatanode(localDatanode, maxNodesPerRack, false, > results, avoidStaleNodes)) { > for (Iterator<Map.Entry<StorageType, Integer>> iter = storageTypes > .entrySet().iterator(); iter.hasNext(); ) { > Map.Entry<StorageType, Integer> entry = iter.next(); > for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( > localDatanode.getStorageInfos())) { > StorageType type = entry.getKey(); > if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, > results, type) >= 0) { > int num = entry.getValue(); > ... > {code} > (current logic above) > *2.* Improve the logic and remove some duplicated code > for example, In {{chooseLocalStorage}}, {{chooseRandom}}, we add the node to > excludeNodes before the {{for}}, and we do it again if we find it's a good > target. {{numOfAvailableNodes -= newExcludedNodes}} is duplicated too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)