[jira] [Commented] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730727#comment-14730727 ] Surendra Singh Lilhore commented on HDFS-8946: -- *One suggestion :* In this jira some logs are removed. 1. {code} if (requiredSize > remaining - scheduledSize) { logNodeIsNotChosen(storage, "the node does not have enough " + storage.getStorageType() + " space" + " (required=" + requiredSize + ", scheduled=" + scheduledSize + ", remaining=" + remaining + ")"); return false; } {code} 2. {code} logNodeIsNotChosen(storage, "storage is read-only"); {code} 3. {code} logNodeIsNotChosen(storage, "storage has failed"); {code} I think this is very important for debugging issue. From debug log we will get to know "why DN is not selected for block replication". Can we add this back? At least first log should be there. In HDFS-9023 some log improvement I want to do, when NN is not able to identify DN's for replication. > Improve choosing datanode storage for block placement > - > > Key: HDFS-8946 > URL: https://issues.apache.org/jira/browse/HDFS-8946 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yi Liu >Assignee: Yi Liu > Fix For: 2.8.0 > > Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch, > HDFS-8946.003.patch > > > This JIRA is to: > Improve chooseing datanode storage for block placement: > In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, > {{chooseRandom}}), we have following logic to choose datanode storage to > place block. > For given storage type, we iterate storages of the datanode. But for > datanode, it only cares about the storage type. In the loop, we check > according to Storage type and return the first storage if the storages of the > type on the datanode fit in requirement. So we can remove the iteration of > storages, and just need to do once to find a good storage of given type, it's > efficient if the storages of the type on the datanode don't fit in > requirement since we don't need to loop all storages and do the same check. > Besides, no need to shuffle the storages, since we only need to check > according to the storage type on the datanode once. > This also improves the logic and make it more clear. > {code} > if (excludedNodes.add(localMachine) // was not in the excluded list > && isGoodDatanode(localDatanode, maxNodesPerRack, false, > results, avoidStaleNodes)) { > for (Iterator> iter = storageTypes > .entrySet().iterator(); iter.hasNext(); ) { > Map.Entry entry = iter.next(); > for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( > localDatanode.getStorageInfos())) { > StorageType type = entry.getKey(); > if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, > results, type) >= 0) { > int num = entry.getValue(); > ... > {code} > (current logic above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724799#comment-14724799 ] Hudson commented on HDFS-8946: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2255 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2255/]) HDFS-8946. Improve choosing datanode storage for block placement. (yliu) (yliu: rev 8fa41d9dd4b923bf4141f019414a1a8b079124c6) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java > Improve choosing datanode storage for block placement > - > > Key: HDFS-8946 > URL: https://issues.apache.org/jira/browse/HDFS-8946 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yi Liu >Assignee: Yi Liu > Fix For: 2.8.0 > > Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch, > HDFS-8946.003.patch > > > This JIRA is to: > Improve chooseing datanode storage for block placement: > In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, > {{chooseRandom}}), we have following logic to choose datanode storage to > place block. > For given storage type, we iterate storages of the datanode. But for > datanode, it only cares about the storage type. In the loop, we check > according to Storage type and return the first storage if the storages of the > type on the datanode fit in requirement. So we can remove the iteration of > storages, and just need to do once to find a good storage of given type, it's > efficient if the storages of the type on the datanode don't fit in > requirement since we don't need to loop all storages and do the same check. > Besides, no need to shuffle the storages, since we only need to check > according to the storage type on the datanode once. > This also improves the logic and make it more clear. > {code} > if (excludedNodes.add(localMachine) // was not in the excluded list > && isGoodDatanode(localDatanode, maxNodesPerRack, false, > results, avoidStaleNodes)) { > for (Iterator> iter = storageTypes > .entrySet().iterator(); iter.hasNext(); ) { > Map.Entry entry = iter.next(); > for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( > localDatanode.getStorageInfos())) { > StorageType type = entry.getKey(); > if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, > results, type) >= 0) { > int num = entry.getValue(); > ... > {code} > (current logic above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724785#comment-14724785 ] Hudson commented on HDFS-8946: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #316 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/316/]) HDFS-8946. Improve choosing datanode storage for block placement. (yliu) (yliu: rev 8fa41d9dd4b923bf4141f019414a1a8b079124c6) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java > Improve choosing datanode storage for block placement > - > > Key: HDFS-8946 > URL: https://issues.apache.org/jira/browse/HDFS-8946 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yi Liu >Assignee: Yi Liu > Fix For: 2.8.0 > > Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch, > HDFS-8946.003.patch > > > This JIRA is to: > Improve chooseing datanode storage for block placement: > In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, > {{chooseRandom}}), we have following logic to choose datanode storage to > place block. > For given storage type, we iterate storages of the datanode. But for > datanode, it only cares about the storage type. In the loop, we check > according to Storage type and return the first storage if the storages of the > type on the datanode fit in requirement. So we can remove the iteration of > storages, and just need to do once to find a good storage of given type, it's > efficient if the storages of the type on the datanode don't fit in > requirement since we don't need to loop all storages and do the same check. > Besides, no need to shuffle the storages, since we only need to check > according to the storage type on the datanode once. > This also improves the logic and make it more clear. > {code} > if (excludedNodes.add(localMachine) // was not in the excluded list > && isGoodDatanode(localDatanode, maxNodesPerRack, false, > results, avoidStaleNodes)) { > for (Iterator> iter = storageTypes > .entrySet().iterator(); iter.hasNext(); ) { > Map.Entry entry = iter.next(); > for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( > localDatanode.getStorageInfos())) { > StorageType type = entry.getKey(); > if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, > results, type) >= 0) { > int num = entry.getValue(); > ... > {code} > (current logic above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724772#comment-14724772 ] Hudson commented on HDFS-8946: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2274 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2274/]) HDFS-8946. Improve choosing datanode storage for block placement. (yliu) (yliu: rev 8fa41d9dd4b923bf4141f019414a1a8b079124c6) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Improve choosing datanode storage for block placement > - > > Key: HDFS-8946 > URL: https://issues.apache.org/jira/browse/HDFS-8946 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yi Liu >Assignee: Yi Liu > Fix For: 2.8.0 > > Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch, > HDFS-8946.003.patch > > > This JIRA is to: > Improve chooseing datanode storage for block placement: > In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, > {{chooseRandom}}), we have following logic to choose datanode storage to > place block. > For given storage type, we iterate storages of the datanode. But for > datanode, it only cares about the storage type. In the loop, we check > according to Storage type and return the first storage if the storages of the > type on the datanode fit in requirement. So we can remove the iteration of > storages, and just need to do once to find a good storage of given type, it's > efficient if the storages of the type on the datanode don't fit in > requirement since we don't need to loop all storages and do the same check. > Besides, no need to shuffle the storages, since we only need to check > according to the storage type on the datanode once. > This also improves the logic and make it more clear. > {code} > if (excludedNodes.add(localMachine) // was not in the excluded list > && isGoodDatanode(localDatanode, maxNodesPerRack, false, > results, avoidStaleNodes)) { > for (Iterator> iter = storageTypes > .entrySet().iterator(); iter.hasNext(); ) { > Map.Entry entry = iter.next(); > for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( > localDatanode.getStorageInfos())) { > StorageType type = entry.getKey(); > if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, > results, type) >= 0) { > int num = entry.getValue(); > ... > {code} > (current logic above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724746#comment-14724746 ] Hudson commented on HDFS-8946: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #332 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/332/]) HDFS-8946. Improve choosing datanode storage for block placement. (yliu) (yliu: rev 8fa41d9dd4b923bf4141f019414a1a8b079124c6) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java > Improve choosing datanode storage for block placement > - > > Key: HDFS-8946 > URL: https://issues.apache.org/jira/browse/HDFS-8946 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yi Liu >Assignee: Yi Liu > Fix For: 2.8.0 > > Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch, > HDFS-8946.003.patch > > > This JIRA is to: > Improve chooseing datanode storage for block placement: > In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, > {{chooseRandom}}), we have following logic to choose datanode storage to > place block. > For given storage type, we iterate storages of the datanode. But for > datanode, it only cares about the storage type. In the loop, we check > according to Storage type and return the first storage if the storages of the > type on the datanode fit in requirement. So we can remove the iteration of > storages, and just need to do once to find a good storage of given type, it's > efficient if the storages of the type on the datanode don't fit in > requirement since we don't need to loop all storages and do the same check. > Besides, no need to shuffle the storages, since we only need to check > according to the storage type on the datanode once. > This also improves the logic and make it more clear. > {code} > if (excludedNodes.add(localMachine) // was not in the excluded list > && isGoodDatanode(localDatanode, maxNodesPerRack, false, > results, avoidStaleNodes)) { > for (Iterator> iter = storageTypes > .entrySet().iterator(); iter.hasNext(); ) { > Map.Entry entry = iter.next(); > for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( > localDatanode.getStorageInfos())) { > StorageType type = entry.getKey(); > if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, > results, type) >= 0) { > int num = entry.getValue(); > ... > {code} > (current logic above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724742#comment-14724742 ] Hudson commented on HDFS-8946: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #325 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/325/]) HDFS-8946. Improve choosing datanode storage for block placement. (yliu) (yliu: rev 8fa41d9dd4b923bf4141f019414a1a8b079124c6) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java > Improve choosing datanode storage for block placement > - > > Key: HDFS-8946 > URL: https://issues.apache.org/jira/browse/HDFS-8946 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yi Liu >Assignee: Yi Liu > Fix For: 2.8.0 > > Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch, > HDFS-8946.003.patch > > > This JIRA is to: > Improve chooseing datanode storage for block placement: > In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, > {{chooseRandom}}), we have following logic to choose datanode storage to > place block. > For given storage type, we iterate storages of the datanode. But for > datanode, it only cares about the storage type. In the loop, we check > according to Storage type and return the first storage if the storages of the > type on the datanode fit in requirement. So we can remove the iteration of > storages, and just need to do once to find a good storage of given type, it's > efficient if the storages of the type on the datanode don't fit in > requirement since we don't need to loop all storages and do the same check. > Besides, no need to shuffle the storages, since we only need to check > according to the storage type on the datanode once. > This also improves the logic and make it more clear. > {code} > if (excludedNodes.add(localMachine) // was not in the excluded list > && isGoodDatanode(localDatanode, maxNodesPerRack, false, > results, avoidStaleNodes)) { > for (Iterator> iter = storageTypes > .entrySet().iterator(); iter.hasNext(); ) { > Map.Entry entry = iter.next(); > for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( > localDatanode.getStorageInfos())) { > StorageType type = entry.getKey(); > if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, > results, type) >= 0) { > int num = entry.getValue(); > ... > {code} > (current logic above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724602#comment-14724602 ] Hudson commented on HDFS-8946: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1058 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1058/]) HDFS-8946. Improve choosing datanode storage for block placement. (yliu) (yliu: rev 8fa41d9dd4b923bf4141f019414a1a8b079124c6) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java > Improve choosing datanode storage for block placement > - > > Key: HDFS-8946 > URL: https://issues.apache.org/jira/browse/HDFS-8946 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yi Liu >Assignee: Yi Liu > Fix For: 2.8.0 > > Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch, > HDFS-8946.003.patch > > > This JIRA is to: > Improve chooseing datanode storage for block placement: > In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, > {{chooseRandom}}), we have following logic to choose datanode storage to > place block. > For given storage type, we iterate storages of the datanode. But for > datanode, it only cares about the storage type. In the loop, we check > according to Storage type and return the first storage if the storages of the > type on the datanode fit in requirement. So we can remove the iteration of > storages, and just need to do once to find a good storage of given type, it's > efficient if the storages of the type on the datanode don't fit in > requirement since we don't need to loop all storages and do the same check. > Besides, no need to shuffle the storages, since we only need to check > according to the storage type on the datanode once. > This also improves the logic and make it more clear. > {code} > if (excludedNodes.add(localMachine) // was not in the excluded list > && isGoodDatanode(localDatanode, maxNodesPerRack, false, > results, avoidStaleNodes)) { > for (Iterator> iter = storageTypes > .entrySet().iterator(); iter.hasNext(); ) { > Map.Entry entry = iter.next(); > for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( > localDatanode.getStorageInfos())) { > StorageType type = entry.getKey(); > if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, > results, type) >= 0) { > int num = entry.getValue(); > ... > {code} > (current logic above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724528#comment-14724528 ] Hudson commented on HDFS-8946: -- FAILURE: Integrated in Hadoop-trunk-Commit #8375 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8375/]) HDFS-8946. Improve choosing datanode storage for block placement. (yliu) (yliu: rev 8fa41d9dd4b923bf4141f019414a1a8b079124c6) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java > Improve choosing datanode storage for block placement > - > > Key: HDFS-8946 > URL: https://issues.apache.org/jira/browse/HDFS-8946 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yi Liu >Assignee: Yi Liu > Fix For: 2.8.0 > > Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch, > HDFS-8946.003.patch > > > This JIRA is to: > Improve chooseing datanode storage for block placement: > In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, > {{chooseRandom}}), we have following logic to choose datanode storage to > place block. > For given storage type, we iterate storages of the datanode. But for > datanode, it only cares about the storage type. In the loop, we check > according to Storage type and return the first storage if the storages of the > type on the datanode fit in requirement. So we can remove the iteration of > storages, and just need to do once to find a good storage of given type, it's > efficient if the storages of the type on the datanode don't fit in > requirement since we don't need to loop all storages and do the same check. > Besides, no need to shuffle the storages, since we only need to check > according to the storage type on the datanode once. > This also improves the logic and make it more clear. > {code} > if (excludedNodes.add(localMachine) // was not in the excluded list > && isGoodDatanode(localDatanode, maxNodesPerRack, false, > results, avoidStaleNodes)) { > for (Iterator> iter = storageTypes > .entrySet().iterator(); iter.hasNext(); ) { > Map.Entry entry = iter.next(); > for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( > localDatanode.getStorageInfos())) { > StorageType type = entry.getKey(); > if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, > results, type) >= 0) { > int num = entry.getValue(); > ... > {code} > (current logic above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724483#comment-14724483 ] Yi Liu commented on HDFS-8946: -- Thanks a lot for the view, [~kihwal]! Will commit shortly. > Improve choosing datanode storage for block placement > - > > Key: HDFS-8946 > URL: https://issues.apache.org/jira/browse/HDFS-8946 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch, > HDFS-8946.003.patch > > > This JIRA is to: > Improve chooseing datanode storage for block placement: > In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, > {{chooseRandom}}), we have following logic to choose datanode storage to > place block. > For given storage type, we iterate storages of the datanode. But for > datanode, it only cares about the storage type. In the loop, we check > according to Storage type and return the first storage if the storages of the > type on the datanode fit in requirement. So we can remove the iteration of > storages, and just need to do once to find a good storage of given type, it's > efficient if the storages of the type on the datanode don't fit in > requirement since we don't need to loop all storages and do the same check. > Besides, no need to shuffle the storages, since we only need to check > according to the storage type on the datanode once. > This also improves the logic and make it more clear. > {code} > if (excludedNodes.add(localMachine) // was not in the excluded list > && isGoodDatanode(localDatanode, maxNodesPerRack, false, > results, avoidStaleNodes)) { > for (Iterator> iter = storageTypes > .entrySet().iterator(); iter.hasNext(); ) { > Map.Entry entry = iter.next(); > for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( > localDatanode.getStorageInfos())) { > StorageType type = entry.getKey(); > if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, > results, type) >= 0) { > int num = entry.getValue(); > ... > {code} > (current logic above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723589#comment-14723589 ] Kihwal Lee commented on HDFS-8946: -- +1 the patch looks good. > Improve choosing datanode storage for block placement > - > > Key: HDFS-8946 > URL: https://issues.apache.org/jira/browse/HDFS-8946 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch, > HDFS-8946.003.patch > > > This JIRA is to: > Improve chooseing datanode storage for block placement: > In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, > {{chooseRandom}}), we have following logic to choose datanode storage to > place block. > For given storage type, we iterate storages of the datanode. But for > datanode, it only cares about the storage type. In the loop, we check > according to Storage type and return the first storage if the storages of the > type on the datanode fit in requirement. So we can remove the iteration of > storages, and just need to do once to find a good storage of given type, it's > efficient if the storages of the type on the datanode don't fit in > requirement since we don't need to loop all storages and do the same check. > Besides, no need to shuffle the storages, since we only need to check > according to the storage type on the datanode once. > This also improves the logic and make it more clear. > {code} > if (excludedNodes.add(localMachine) // was not in the excluded list > && isGoodDatanode(localDatanode, maxNodesPerRack, false, > results, avoidStaleNodes)) { > for (Iterator> iter = storageTypes > .entrySet().iterator(); iter.hasNext(); ) { > Map.Entry entry = iter.next(); > for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( > localDatanode.getStorageInfos())) { > StorageType type = entry.getKey(); > if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, > results, type) >= 0) { > int num = entry.getValue(); > ... > {code} > (current logic above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720899#comment-14720899 ] Yi Liu commented on HDFS-8946: -- The test failure is not related, they run successfully in my local env. > Improve choosing datanode storage for block placement > - > > Key: HDFS-8946 > URL: https://issues.apache.org/jira/browse/HDFS-8946 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch, > HDFS-8946.003.patch > > > This JIRA is to: > Improve chooseing datanode storage for block placement: > In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, > {{chooseRandom}}), we have following logic to choose datanode storage to > place block. > For given storage type, we iterate storages of the datanode. But for > datanode, it only cares about the storage type. In the loop, we check > according to Storage type and return the first storage if the storages of the > type on the datanode fit in requirement. So we can remove the iteration of > storages, and just need to do once to find a good storage of given type, it's > efficient if the storages of the type on the datanode don't fit in > requirement since we don't need to loop all storages and do the same check. > Besides, no need to shuffle the storages, since we only need to check > according to the storage type on the datanode once. > This also improves the logic and make it more clear. > {code} > if (excludedNodes.add(localMachine) // was not in the excluded list > && isGoodDatanode(localDatanode, maxNodesPerRack, false, > results, avoidStaleNodes)) { > for (Iterator> iter = storageTypes > .entrySet().iterator(); iter.hasNext(); ) { > Map.Entry entry = iter.next(); > for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( > localDatanode.getStorageInfos())) { > StorageType type = entry.getKey(); > if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, > results, type) >= 0) { > int num = entry.getValue(); > ... > {code} > (current logic above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14719294#comment-14719294 ] Hadoop QA commented on HDFS-8946: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 5s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 6s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 20s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 21s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 36s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 18s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 185m 32s | Tests failed in hadoop-hdfs. | | | | 231m 51s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.blockmanagement.TestBlockManager | | | hadoop.hdfs.TestRollingUpgrade | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752975/HDFS-8946.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / e166c03 | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12194/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12194/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12194/console | This message was automatically generated. > Improve choosing datanode storage for block placement > - > > Key: HDFS-8946 > URL: https://issues.apache.org/jira/browse/HDFS-8946 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch, > HDFS-8946.003.patch > > > This JIRA is to: > Improve chooseing datanode storage for block placement: > In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, > {{chooseRandom}}), we have following logic to choose datanode storage to > place block. > For given storage type, we iterate storages of the datanode. But for > datanode, it only cares about the storage type. In the loop, we check > according to Storage type and return the first storage if the storages of the > type on the datanode fit in requirement. So we can remove the iteration of > storages, and just need to do once to find a good storage of given type, it's > efficient if the storages of the type on the datanode don't fit in > requirement since we don't need to loop all storages and do the same check. > Besides, no need to shuffle the storages, since we only need to check > according to the storage type on the datanode once. > This also improves the logic and make it more clear. > {code} > if (excludedNodes.add(localMachine) // was not in the excluded list > && isGoodDatanode(localDatanode, maxNodesPerRack, false, > results, avoidStaleNodes)) { > for (Iterator> iter = storageTypes > .entrySet().iterator(); iter.hasNext(); ) { > Map.Entry entry = iter.next(); > for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( > localDatanode.getStorageInfos())) { > StorageType type = entry.getKey(); > if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, > results, type) >= 0) { > int num = entry.getValue(); > ... > {code} > (current logic above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718234#comment-14718234 ] Yi Liu commented on HDFS-8946: -- Will update the patch later. > Improve choosing datanode storage for block placement > - > > Key: HDFS-8946 > URL: https://issues.apache.org/jira/browse/HDFS-8946 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch > > > This JIRA is to: > Improve chooseing datanode storage for block placement: > In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, > {{chooseRandom}}), we have following logic to choose datanode storage to > place block. > For given storage type, we iterate storages of the datanode. But for > datanode, it only cares about the storage type. In the loop, we check > according to Storage type and return the first storage if the storages of the > type on the datanode fit in requirement. So we can remove the iteration of > storages, and just need to do once to find a good storage of given type, it's > efficient if the storages of the type on the datanode don't fit in > requirement since we don't need to loop all storages and do the same check. > Besides, no need to shuffle the storages, since we only need to check > according to the storage type on the datanode once. > This also improves the logic and make it more clear. > {code} > if (excludedNodes.add(localMachine) // was not in the excluded list > && isGoodDatanode(localDatanode, maxNodesPerRack, false, > results, avoidStaleNodes)) { > for (Iterator> iter = storageTypes > .entrySet().iterator(); iter.hasNext(); ) { > Map.Entry entry = iter.next(); > for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( > localDatanode.getStorageInfos())) { > StorageType type = entry.getKey(); > if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, > results, type) >= 0) { > int num = entry.getValue(); > ... > {code} > (current logic above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718232#comment-14718232 ] Yi Liu commented on HDFS-8946: -- Thanks Masatake for the comments. {quote} LocatedBlock returned by ClientProtocol#addBlock, ClientProtocol#getAdditionalDatanode and ClientProtocol#updateBlockForPipeline contains storageIDs given by BlockPlacementPolicy#chooseTarget but the user of these APIs (which is only DataStreamer) does not uses storageIDs. DataStreamer just send storage type to DataNode and the DataNode decides which volume to use on its own by using VolumeChoosingPolicy. {quote} Yes, {{storageIDs}} is not used. {quote} Any reason to change the logic of remaining size checking? {quote} Nice find, It's my fault to forget to add {{blockSize *}} while coping this logic from original {{isGoodTarget}} > Improve choosing datanode storage for block placement > - > > Key: HDFS-8946 > URL: https://issues.apache.org/jira/browse/HDFS-8946 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch > > > This JIRA is to: > Improve chooseing datanode storage for block placement: > In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, > {{chooseRandom}}), we have following logic to choose datanode storage to > place block. > For given storage type, we iterate storages of the datanode. But for > datanode, it only cares about the storage type. In the loop, we check > according to Storage type and return the first storage if the storages of the > type on the datanode fit in requirement. So we can remove the iteration of > storages, and just need to do once to find a good storage of given type, it's > efficient if the storages of the type on the datanode don't fit in > requirement since we don't need to loop all storages and do the same check. > Besides, no need to shuffle the storages, since we only need to check > according to the storage type on the datanode once. > This also improves the logic and make it more clear. > {code} > if (excludedNodes.add(localMachine) // was not in the excluded list > && isGoodDatanode(localDatanode, maxNodesPerRack, false, > results, avoidStaleNodes)) { > for (Iterator> iter = storageTypes > .entrySet().iterator(); iter.hasNext(); ) { > Map.Entry entry = iter.next(); > for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( > localDatanode.getStorageInfos())) { > StorageType type = entry.getKey(); > if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, > results, type) >= 0) { > int num = entry.getValue(); > ... > {code} > (current logic above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718226#comment-14718226 ] Masatake Iwasaki commented on HDFS-8946: DatanodeDescriptor.java {code} 679 final long requiredSize = 680 blockSize * HdfsServerConstants.MIN_BLOCKS_FOR_WRITE; 681 final long scheduledSize = getBlocksScheduled(t); 682 long remaining = 0; 683 DatanodeStorageInfo storage = null; 684 for (DatanodeStorageInfo s : getStorageInfos()) { 685 if (s.getState() == State.NORMAL && 686 s.getStorageType() == t) { 687 if (storage == null) { 688 storage = s; 689 } 690 long r = s.getRemaining(); 691 if (r >= requiredSize) { 692 remaining += r; 693 } 694 } 695 } 696 if (requiredSize > remaining - scheduledSize) { 697 return null; {code} {{scheduledSize}} is number of blocks but used as if it's bytes. Any reason to change the logic of remaining size checking? > Improve choosing datanode storage for block placement > - > > Key: HDFS-8946 > URL: https://issues.apache.org/jira/browse/HDFS-8946 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch > > > This JIRA is to: > Improve chooseing datanode storage for block placement: > In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, > {{chooseRandom}}), we have following logic to choose datanode storage to > place block. > For given storage type, we iterate storages of the datanode. But for > datanode, it only cares about the storage type. In the loop, we check > according to Storage type and return the first storage if the storages of the > type on the datanode fit in requirement. So we can remove the iteration of > storages, and just need to do once to find a good storage of given type, it's > efficient if the storages of the type on the datanode don't fit in > requirement since we don't need to loop all storages and do the same check. > Besides, no need to shuffle the storages, since we only need to check > according to the storage type on the datanode once. > This also improves the logic and make it more clear. > {code} > if (excludedNodes.add(localMachine) // was not in the excluded list > && isGoodDatanode(localDatanode, maxNodesPerRack, false, > results, avoidStaleNodes)) { > for (Iterator> iter = storageTypes > .entrySet().iterator(); iter.hasNext(); ) { > Map.Entry entry = iter.next(); > for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( > localDatanode.getStorageInfos())) { > StorageType type = entry.getKey(); > if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, > results, type) >= 0) { > int num = entry.getValue(); > ... > {code} > (current logic above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718222#comment-14718222 ] Masatake Iwasaki commented on HDFS-8946: Thanks for working on this, [~Hitliuyi]. I read the code of BlockPlacementPolicyDefault for HDFS-8945 recently and comment here while my memory is fresh:-) bq. Besides, no need to shuffle the storages, since we only need to check according to the storage type on the datanode once. Here is my understanding of this. Please correct me if I'm wrong: {{LocatedBlock}} returned by {{ClientProtocol#addBlock}}, {{ClientProtocol#getAdditionalDatanode}} and {{ClientProtocol#updateBlockForPipeline}} contains storageIDs given by {{BlockPlacementPolicy#chooseTarget}} but the user of these APIs (which is only DataStreamer) does not uses storageIDs. DataStreamer just send storage type to DataNode and the DataNode decides which volume to use on its own by using {{VolumeChoosingPolicy}}. > Improve choosing datanode storage for block placement > - > > Key: HDFS-8946 > URL: https://issues.apache.org/jira/browse/HDFS-8946 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch > > > This JIRA is to: > Improve chooseing datanode storage for block placement: > In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, > {{chooseRandom}}), we have following logic to choose datanode storage to > place block. > For given storage type, we iterate storages of the datanode. But for > datanode, it only cares about the storage type. In the loop, we check > according to Storage type and return the first storage if the storages of the > type on the datanode fit in requirement. So we can remove the iteration of > storages, and just need to do once to find a good storage of given type, it's > efficient if the storages of the type on the datanode don't fit in > requirement since we don't need to loop all storages and do the same check. > Besides, no need to shuffle the storages, since we only need to check > according to the storage type on the datanode once. > This also improves the logic and make it more clear. > {code} > if (excludedNodes.add(localMachine) // was not in the excluded list > && isGoodDatanode(localDatanode, maxNodesPerRack, false, > results, avoidStaleNodes)) { > for (Iterator> iter = storageTypes > .entrySet().iterator(); iter.hasNext(); ) { > Map.Entry entry = iter.next(); > for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( > localDatanode.getStorageInfos())) { > StorageType type = entry.getKey(); > if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, > results, type) >= 0) { > int num = entry.getValue(); > ... > {code} > (current logic above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717005#comment-14717005 ] Hadoop QA commented on HDFS-8946: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 24s | Pre-patch trunk has 4 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 40s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 52s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 19s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 7s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 161m 10s | Tests passed in hadoop-hdfs. | | | | 205m 25s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752741/HDFS-8946.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0bf2854 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/12162/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12162/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12162/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12162/console | This message was automatically generated. > Improve choosing datanode storage for block placement > - > > Key: HDFS-8946 > URL: https://issues.apache.org/jira/browse/HDFS-8946 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch > > > This JIRA is to: > Improve chooseing datanode storage for block placement: > In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, > {{chooseRandom}}), we have following logic to choose datanode storage to > place block. > For given storage type, we iterate storages of the datanode. But for > datanode, it only cares about the storage type. In the loop, we check > according to Storage type and return the first storage if the storages of the > type on the datanode fit in requirement. So we can remove the iteration of > storages, and just need to do once to find a good storage of given type, it's > efficient if the storages of the type on the datanode don't fit in > requirement since we don't need to loop all storages and do the same check. > Besides, no need to shuffle the storages, since we only need to check > according to the storage type on the datanode once. > This also improves the logic and make it more clear. > {code} > if (excludedNodes.add(localMachine) // was not in the excluded list > && isGoodDatanode(localDatanode, maxNodesPerRack, false, > results, avoidStaleNodes)) { > for (Iterator> iter = storageTypes > .entrySet().iterator(); iter.hasNext(); ) { > Map.Entry entry = iter.next(); > for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( > localDatanode.getStorageInfos())) { > StorageType type = entry.getKey(); > if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, > results, type) >= 0) { > int num = entry.getValue(); > ... > {code} > (current logic above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716632#comment-14716632 ] Yi Liu commented on HDFS-8946: -- Update the patch to fix test failure, the reason is: we can't remove addToExcludedNodes, since it's override by subclass. > Improve choosing datanode storage for block placement > - > > Key: HDFS-8946 > URL: https://issues.apache.org/jira/browse/HDFS-8946 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch > > > This JIRA is to: > Improve chooseing datanode storage for block placement: > In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, > {{chooseRandom}}), we have following logic to choose datanode storage to > place block. > For given storage type, we iterate storages of the datanode. But for > datanode, it only cares about the storage type. In the loop, we check > according to Storage type and return the first storage if the storages of the > type on the datanode fit in requirement. So we can remove the iteration of > storages, and just need to do once to find a good storage of given type, it's > efficient if the storages of the type on the datanode don't fit in > requirement since we don't need to loop all storages and do the same check. > Besides, no need to shuffle the storages, since we only need to check > according to the storage type on the datanode once. > This also improves the logic and make it more clear. > {code} > if (excludedNodes.add(localMachine) // was not in the excluded list > && isGoodDatanode(localDatanode, maxNodesPerRack, false, > results, avoidStaleNodes)) { > for (Iterator> iter = storageTypes > .entrySet().iterator(); iter.hasNext(); ) { > Map.Entry entry = iter.next(); > for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( > localDatanode.getStorageInfos())) { > StorageType type = entry.getKey(); > if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, > results, type) >= 0) { > int num = entry.getValue(); > ... > {code} > (current logic above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715156#comment-14715156 ] Hadoop QA commented on HDFS-8946: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 17s | Pre-patch trunk has 4 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 56s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 11s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 21s | The applied patch generated 1 new checkstyle issues (total was 85, now 83). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 33s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 11s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 188m 13s | Tests failed in hadoop-hdfs. | | | | 233m 9s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestReplaceDatanodeOnFailure | | | hadoop.hdfs.server.blockmanagement.TestHeartbeatHandling | | | hadoop.hdfs.server.namenode.TestFSNamesystem | | | hadoop.hdfs.TestRollingUpgrade | | | hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithNodeGroup | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752472/HDFS-8946.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a4d9acc | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/12132/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/12132/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/12132/artifact/patchprocess/whitespace.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12132/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12132/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12132/console | This message was automatically generated. > Improve choosing datanode storage for block placement > - > > Key: HDFS-8946 > URL: https://issues.apache.org/jira/browse/HDFS-8946 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8946.001.patch > > > This JIRA is to: > *1.* Improve chooseing datanode storage for block placement: > In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, > {{chooseRandom}}), we have following logic to choose datanode storage to > place block. > For given storage type, we iterate storages of the datanode. But for > datanode, it only cares about the storage type. In the loop, we check > according to Storage type and return the first storage if the storages of the > type on the datanode fit in requirement. So we can remove the iteration of > storages, and just need to do once to find a good storage of given type, it's > efficient if the storages of the type on the datanode don't fit in > requirement since we don't need to loop all storages and do the same check. > Besides, no need to shuffle the storages, since we only need to check > according to the storage type on the datanode once. > This also improves the logic and make it more clear. > {code} > if (excludedNodes.add(localMachine) // was not in the excluded list > && isGoodDatanode(localDatanode, maxNodesPerRack, false, > results, avoidStaleNodes)) { > for (Iterator> iter = storageTypes > .entrySet().iterator(); iter.hasNext(); ) { > Map.Entry entry = iter.next(); > for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( >
[jira] [Commented] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14713409#comment-14713409 ] Yi Liu commented on HDFS-8946: -- # Add {{chooseStorage4Block}} to choose a good storage of given type from the datanode. It will checks the state of storage, and whether there is enough space to place the block on the datanode. These conditions are the same as original. Since the datanode only cares about the storage type when placing a block, this method will return the first storage of given type we see. Then {{isGoodTarget}} can be removed, since we check all the conditions in {{chooseStorage4Block}}. # No need to shuffle the storages of datanode and no need to iterate all the storages of datanode in {{chooseLocalStorage}} and {{chooseRandom}}, since we can choose the storage of given type using {{chooseStorage4Block}} once. # {{addToExcludedNodes}} is redundant after finding a good storage, since we already add the data node to excludedNodes at the begging of checking the node. # {{numOfAvailableNodes \-= newExcludedNodes;}} in {{chooseRandom}} is redundant, since we already do {{numOfAvailableNodes--}} at the begging of checking the node. # In {{DatanodeDescriptor#chooseStorage4Block}}, {{t == null}} is unnecessary, since it never be null and it's added in HDFS-8863. > Improve choosing datanode storage for block placement > - > > Key: HDFS-8946 > URL: https://issues.apache.org/jira/browse/HDFS-8946 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-8946.001.patch > > > This JIRA is to: > *1.* Improve chooseing datanode storage for block placement: > In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, > {{chooseRandom}}), we have following logic to choose datanode storage to > place block. > For given storage type, we iterate storages of the datanode. But for > datanode, it only cares about the storage type. In the loop, we check > according to Storage type and return the first storage if the storages of the > type on the datanode fit in requirement. So we can remove the iteration of > storages, and just need to do once to find a good storage of given type, it's > efficient if the storages of the type on the datanode don't fit in > requirement since we don't need to loop all storages and do the same check. > Besides, no need to shuffle the storages, since we only need to check > according to the storage type on the datanode once. > This also improves the logic and make it more clear. > {code} > if (excludedNodes.add(localMachine) // was not in the excluded list > && isGoodDatanode(localDatanode, maxNodesPerRack, false, > results, avoidStaleNodes)) { > for (Iterator> iter = storageTypes > .entrySet().iterator(); iter.hasNext(); ) { > Map.Entry entry = iter.next(); > for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( > localDatanode.getStorageInfos())) { > StorageType type = entry.getKey(); > if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, > results, type) >= 0) { > int num = entry.getValue(); > ... > {code} > (current logic above) > *2.* Improve the logic and remove some duplicated code > for example, In {{chooseLocalStorage}}, {{chooseRandom}}, we add the node to > excludeNodes before the {{for}}, and we do it again if we find it's a good > target. {{numOfAvailableNodes -= newExcludedNodes}} is duplicated too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)