[jira] [Updated] (HDFS-16456) EC: Decommission a rack with only on dn will fail when the rack number is equal with replication
[ https://issues.apache.org/jira/browse/HDFS-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caozhiqiang updated HDFS-16456: --- Status: Open (was: Patch Available) > EC: Decommission a rack with only on dn will fail when the rack number is > equal with replication > > > Key: HDFS-16456 > URL: https://issues.apache.org/jira/browse/HDFS-16456 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Priority: Critical > Attachments: HDFS-16456.001.patch, HDFS-16456.002.patch, > HDFS-16456.003.patch, HDFS-16456.004.patch, HDFS-16456.005.patch, > HDFS-16456.006.patch, HDFS-16456.007.patch, HDFS-16456.008.patch > > > In below scenario, decommission will fail by TOO_MANY_NODES_ON_RACK reason: > # Enable EC policy, such as RS-6-3-1024k. > # The rack number in this cluster is equal with or less than the replication > number(9) > # A rack only has one DN, and decommission this DN. > The root cause is in > BlockPlacementPolicyRackFaultTolerant::getMaxNodesPerRack() function, it will > give a limit parameter maxNodesPerRack for choose targets. In this scenario, > the maxNodesPerRack is 1, which means each rack can only be chosen one > datanode. > {code:java} > protected int[] getMaxNodesPerRack(int numOfChosen, int numOfReplicas) { >... > // If more replicas than racks, evenly spread the replicas. > // This calculation rounds up. > int maxNodesPerRack = (totalNumOfReplicas - 1) / numOfRacks + 1; > return new int[] {numOfReplicas, maxNodesPerRack}; > } {code} > int maxNodesPerRack = (totalNumOfReplicas - 1) / numOfRacks + 1; > here will be called, where totalNumOfReplicas=9 and numOfRacks=9 > When we decommission one dn which is only one node in its rack, the > chooseOnce() in BlockPlacementPolicyRackFaultTolerant::chooseTargetInOrder() > will throw NotEnoughReplicasException, but the exception will not be caught > and fail to fallback to chooseEvenlyFromRemainingRacks() function. > When decommission, after choose targets, verifyBlockPlacement() function will > return the total rack number contains the invalid rack, and > BlockPlacementStatusDefault::isPlacementPolicySatisfied() will return false > and it will also cause decommission fail. > {code:java} > public BlockPlacementStatus verifyBlockPlacement(DatanodeInfo[] locs, > int numberOfReplicas) { > if (locs == null) > locs = DatanodeDescriptor.EMPTY_ARRAY; > if (!clusterMap.hasClusterEverBeenMultiRack()) { > // only one rack > return new BlockPlacementStatusDefault(1, 1, 1); > } > // Count locations on different racks. > Set racks = new HashSet<>(); > for (DatanodeInfo dn : locs) { > racks.add(dn.getNetworkLocation()); > } > return new BlockPlacementStatusDefault(racks.size(), numberOfReplicas, > clusterMap.getNumOfRacks()); > } {code} > {code:java} > public boolean isPlacementPolicySatisfied() { > return requiredRacks <= currentRacks || currentRacks >= totalRacks; > }{code} > According to the above description, we should make the below modify to fix it: > # In startDecommission() or stopDecommission(), we should also change the > numOfRacks in class NetworkTopology. Or choose targets may fail for the > maxNodesPerRack is too small. And even choose targets success, > isPlacementPolicySatisfied will also return false cause decommission fail. > # In BlockPlacementPolicyRackFaultTolerant::chooseTargetInOrder(), the first > chooseOnce() function should also be put in try..catch..., or it will not > fallback to call chooseEvenlyFromRemainingRacks() when throw exception. > # In verifyBlockPlacement, we need to remove invalid racks from total > numOfRacks, or isPlacementPolicySatisfied() will return false and cause fail > to reconstruct data. > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16456) EC: Decommission a rack with only on dn will fail when the rack number is equal with replication
[ https://issues.apache.org/jira/browse/HDFS-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caozhiqiang updated HDFS-16456: --- Status: Patch Available (was: Open) > EC: Decommission a rack with only on dn will fail when the rack number is > equal with replication > > > Key: HDFS-16456 > URL: https://issues.apache.org/jira/browse/HDFS-16456 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Priority: Critical > Attachments: HDFS-16456.001.patch, HDFS-16456.002.patch, > HDFS-16456.003.patch, HDFS-16456.004.patch, HDFS-16456.005.patch, > HDFS-16456.006.patch, HDFS-16456.007.patch, HDFS-16456.008.patch > > > In below scenario, decommission will fail by TOO_MANY_NODES_ON_RACK reason: > # Enable EC policy, such as RS-6-3-1024k. > # The rack number in this cluster is equal with or less than the replication > number(9) > # A rack only has one DN, and decommission this DN. > The root cause is in > BlockPlacementPolicyRackFaultTolerant::getMaxNodesPerRack() function, it will > give a limit parameter maxNodesPerRack for choose targets. In this scenario, > the maxNodesPerRack is 1, which means each rack can only be chosen one > datanode. > {code:java} > protected int[] getMaxNodesPerRack(int numOfChosen, int numOfReplicas) { >... > // If more replicas than racks, evenly spread the replicas. > // This calculation rounds up. > int maxNodesPerRack = (totalNumOfReplicas - 1) / numOfRacks + 1; > return new int[] {numOfReplicas, maxNodesPerRack}; > } {code} > int maxNodesPerRack = (totalNumOfReplicas - 1) / numOfRacks + 1; > here will be called, where totalNumOfReplicas=9 and numOfRacks=9 > When we decommission one dn which is only one node in its rack, the > chooseOnce() in BlockPlacementPolicyRackFaultTolerant::chooseTargetInOrder() > will throw NotEnoughReplicasException, but the exception will not be caught > and fail to fallback to chooseEvenlyFromRemainingRacks() function. > When decommission, after choose targets, verifyBlockPlacement() function will > return the total rack number contains the invalid rack, and > BlockPlacementStatusDefault::isPlacementPolicySatisfied() will return false > and it will also cause decommission fail. > {code:java} > public BlockPlacementStatus verifyBlockPlacement(DatanodeInfo[] locs, > int numberOfReplicas) { > if (locs == null) > locs = DatanodeDescriptor.EMPTY_ARRAY; > if (!clusterMap.hasClusterEverBeenMultiRack()) { > // only one rack > return new BlockPlacementStatusDefault(1, 1, 1); > } > // Count locations on different racks. > Set racks = new HashSet<>(); > for (DatanodeInfo dn : locs) { > racks.add(dn.getNetworkLocation()); > } > return new BlockPlacementStatusDefault(racks.size(), numberOfReplicas, > clusterMap.getNumOfRacks()); > } {code} > {code:java} > public boolean isPlacementPolicySatisfied() { > return requiredRacks <= currentRacks || currentRacks >= totalRacks; > }{code} > According to the above description, we should make the below modify to fix it: > # In startDecommission() or stopDecommission(), we should also change the > numOfRacks in class NetworkTopology. Or choose targets may fail for the > maxNodesPerRack is too small. And even choose targets success, > isPlacementPolicySatisfied will also return false cause decommission fail. > # In BlockPlacementPolicyRackFaultTolerant::chooseTargetInOrder(), the first > chooseOnce() function should also be put in try..catch..., or it will not > fallback to call chooseEvenlyFromRemainingRacks() when throw exception. > # In verifyBlockPlacement, we need to remove invalid racks from total > numOfRacks, or isPlacementPolicySatisfied() will return false and cause fail > to reconstruct data. > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16456) EC: Decommission a rack with only on dn will fail when the rack number is equal with replication
[ https://issues.apache.org/jira/browse/HDFS-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caozhiqiang updated HDFS-16456: --- Attachment: HDFS-16456.008.patch > EC: Decommission a rack with only on dn will fail when the rack number is > equal with replication > > > Key: HDFS-16456 > URL: https://issues.apache.org/jira/browse/HDFS-16456 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Priority: Critical > Attachments: HDFS-16456.001.patch, HDFS-16456.002.patch, > HDFS-16456.003.patch, HDFS-16456.004.patch, HDFS-16456.005.patch, > HDFS-16456.006.patch, HDFS-16456.007.patch, HDFS-16456.008.patch > > > In below scenario, decommission will fail by TOO_MANY_NODES_ON_RACK reason: > # Enable EC policy, such as RS-6-3-1024k. > # The rack number in this cluster is equal with or less than the replication > number(9) > # A rack only has one DN, and decommission this DN. > The root cause is in > BlockPlacementPolicyRackFaultTolerant::getMaxNodesPerRack() function, it will > give a limit parameter maxNodesPerRack for choose targets. In this scenario, > the maxNodesPerRack is 1, which means each rack can only be chosen one > datanode. > {code:java} > protected int[] getMaxNodesPerRack(int numOfChosen, int numOfReplicas) { >... > // If more replicas than racks, evenly spread the replicas. > // This calculation rounds up. > int maxNodesPerRack = (totalNumOfReplicas - 1) / numOfRacks + 1; > return new int[] {numOfReplicas, maxNodesPerRack}; > } {code} > int maxNodesPerRack = (totalNumOfReplicas - 1) / numOfRacks + 1; > here will be called, where totalNumOfReplicas=9 and numOfRacks=9 > When we decommission one dn which is only one node in its rack, the > chooseOnce() in BlockPlacementPolicyRackFaultTolerant::chooseTargetInOrder() > will throw NotEnoughReplicasException, but the exception will not be caught > and fail to fallback to chooseEvenlyFromRemainingRacks() function. > When decommission, after choose targets, verifyBlockPlacement() function will > return the total rack number contains the invalid rack, and > BlockPlacementStatusDefault::isPlacementPolicySatisfied() will return false > and it will also cause decommission fail. > {code:java} > public BlockPlacementStatus verifyBlockPlacement(DatanodeInfo[] locs, > int numberOfReplicas) { > if (locs == null) > locs = DatanodeDescriptor.EMPTY_ARRAY; > if (!clusterMap.hasClusterEverBeenMultiRack()) { > // only one rack > return new BlockPlacementStatusDefault(1, 1, 1); > } > // Count locations on different racks. > Set racks = new HashSet<>(); > for (DatanodeInfo dn : locs) { > racks.add(dn.getNetworkLocation()); > } > return new BlockPlacementStatusDefault(racks.size(), numberOfReplicas, > clusterMap.getNumOfRacks()); > } {code} > {code:java} > public boolean isPlacementPolicySatisfied() { > return requiredRacks <= currentRacks || currentRacks >= totalRacks; > }{code} > According to the above description, we should make the below modify to fix it: > # In startDecommission() or stopDecommission(), we should also change the > numOfRacks in class NetworkTopology. Or choose targets may fail for the > maxNodesPerRack is too small. And even choose targets success, > isPlacementPolicySatisfied will also return false cause decommission fail. > # In BlockPlacementPolicyRackFaultTolerant::chooseTargetInOrder(), the first > chooseOnce() function should also be put in try..catch..., or it will not > fallback to call chooseEvenlyFromRemainingRacks() when throw exception. > # In verifyBlockPlacement, we need to remove invalid racks from total > numOfRacks, or isPlacementPolicySatisfied() will return false and cause fail > to reconstruct data. > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16456) EC: Decommission a rack with only on dn will fail when the rack number is equal with replication
[ https://issues.apache.org/jira/browse/HDFS-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caozhiqiang updated HDFS-16456: --- Attachment: (was: HDFS-16456.008.patch) > EC: Decommission a rack with only on dn will fail when the rack number is > equal with replication > > > Key: HDFS-16456 > URL: https://issues.apache.org/jira/browse/HDFS-16456 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Priority: Critical > Attachments: HDFS-16456.001.patch, HDFS-16456.002.patch, > HDFS-16456.003.patch, HDFS-16456.004.patch, HDFS-16456.005.patch, > HDFS-16456.006.patch, HDFS-16456.007.patch > > > In below scenario, decommission will fail by TOO_MANY_NODES_ON_RACK reason: > # Enable EC policy, such as RS-6-3-1024k. > # The rack number in this cluster is equal with or less than the replication > number(9) > # A rack only has one DN, and decommission this DN. > The root cause is in > BlockPlacementPolicyRackFaultTolerant::getMaxNodesPerRack() function, it will > give a limit parameter maxNodesPerRack for choose targets. In this scenario, > the maxNodesPerRack is 1, which means each rack can only be chosen one > datanode. > {code:java} > protected int[] getMaxNodesPerRack(int numOfChosen, int numOfReplicas) { >... > // If more replicas than racks, evenly spread the replicas. > // This calculation rounds up. > int maxNodesPerRack = (totalNumOfReplicas - 1) / numOfRacks + 1; > return new int[] {numOfReplicas, maxNodesPerRack}; > } {code} > int maxNodesPerRack = (totalNumOfReplicas - 1) / numOfRacks + 1; > here will be called, where totalNumOfReplicas=9 and numOfRacks=9 > When we decommission one dn which is only one node in its rack, the > chooseOnce() in BlockPlacementPolicyRackFaultTolerant::chooseTargetInOrder() > will throw NotEnoughReplicasException, but the exception will not be caught > and fail to fallback to chooseEvenlyFromRemainingRacks() function. > When decommission, after choose targets, verifyBlockPlacement() function will > return the total rack number contains the invalid rack, and > BlockPlacementStatusDefault::isPlacementPolicySatisfied() will return false > and it will also cause decommission fail. > {code:java} > public BlockPlacementStatus verifyBlockPlacement(DatanodeInfo[] locs, > int numberOfReplicas) { > if (locs == null) > locs = DatanodeDescriptor.EMPTY_ARRAY; > if (!clusterMap.hasClusterEverBeenMultiRack()) { > // only one rack > return new BlockPlacementStatusDefault(1, 1, 1); > } > // Count locations on different racks. > Set racks = new HashSet<>(); > for (DatanodeInfo dn : locs) { > racks.add(dn.getNetworkLocation()); > } > return new BlockPlacementStatusDefault(racks.size(), numberOfReplicas, > clusterMap.getNumOfRacks()); > } {code} > {code:java} > public boolean isPlacementPolicySatisfied() { > return requiredRacks <= currentRacks || currentRacks >= totalRacks; > }{code} > According to the above description, we should make the below modify to fix it: > # In startDecommission() or stopDecommission(), we should also change the > numOfRacks in class NetworkTopology. Or choose targets may fail for the > maxNodesPerRack is too small. And even choose targets success, > isPlacementPolicySatisfied will also return false cause decommission fail. > # In BlockPlacementPolicyRackFaultTolerant::chooseTargetInOrder(), the first > chooseOnce() function should also be put in try..catch..., or it will not > fallback to call chooseEvenlyFromRemainingRacks() when throw exception. > # In verifyBlockPlacement, we need to remove invalid racks from total > numOfRacks, or isPlacementPolicySatisfied() will return false and cause fail > to reconstruct data. > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16521) DFS API to retrieve slow datanodes
[ https://issues.apache.org/jira/browse/HDFS-16521?focusedWorklogId=748300&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-748300 ] ASF GitHub Bot logged work on HDFS-16521: - Author: ASF GitHub Bot Created on: 27/Mar/22 05:08 Start Date: 27/Mar/22 05:08 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #4107: URL: https://github.com/apache/hadoop/pull/4107#issuecomment-1079841252 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 25s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | buf | 0m 0s | | buf was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 3 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 12m 52s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 25m 9s | | trunk passed | | +1 :green_heart: | compile | 6m 26s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 6m 6s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 15s | | trunk passed | | +1 :green_heart: | mvnsite | 3m 21s | | trunk passed | | +1 :green_heart: | javadoc | 2m 34s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 3m 18s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 7m 32s | | trunk passed | | +1 :green_heart: | shadedclient | 23m 13s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 28s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 55s | | the patch passed | | +1 :green_heart: | compile | 6m 8s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | cc | 6m 8s | | the patch passed | | -1 :x: | javac | 6m 8s | [/results-compile-javac-hadoop-hdfs-project-jdkUbuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4107/3/artifact/out/results-compile-javac-hadoop-hdfs-project-jdkUbuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04.txt) | hadoop-hdfs-project-jdkUbuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 generated 1 new + 651 unchanged - 0 fixed = 652 total (was 651) | | +1 :green_heart: | compile | 5m 47s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | cc | 5m 47s | | the patch passed | | -1 :x: | javac | 5m 47s | [/results-compile-javac-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4107/3/artifact/out/results-compile-javac-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt) | hadoop-hdfs-project-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 generated 1 new + 629 unchanged - 0 fixed = 630 total (was 629) | | +1 :green_heart: | blanks | 0m 1s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 9s | | hadoop-hdfs-project: The patch generated 0 new + 456 unchanged - 1 fixed = 456 total (was 457) | | +1 :green_heart: | mvnsite | 2m 57s | | the patch passed | | +1 :green_heart: | javadoc | 2m 13s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 3m 1s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 7m 37s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 41s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 26s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 500m 40s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4107/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | -1 :x: | unit
[jira] [Commented] (HDFS-16456) EC: Decommission a rack with only on dn will fail when the rack number is equal with replication
[ https://issues.apache.org/jira/browse/HDFS-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512830#comment-17512830 ] Hadoop QA commented on HDFS-16456: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 40s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 9s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 39s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 23m 18s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 58s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 32s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 27s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 28m 2s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 36s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 43s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 40m 47s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 6m 23s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 1m 19s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/779/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 3m 58s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/779/artifact/out/patch-compile-root-jdkUbuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04.txt{color} | {color:red} root in the patch failed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 58s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/779/artifact/out/patch-compile-root-jdkUbuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04.txt{color} | {color:red} root in the patch failed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 3m 18s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/779/artifact/out/patch-compile-root-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt{color} | {color:red} root in the patch failed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 18s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/779/artifact/out/patch-compile-root-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt{color} | {color:red} root in the patch failed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:oran
[jira] [Updated] (HDFS-16456) EC: Decommission a rack with only on dn will fail when the rack number is equal with replication
[ https://issues.apache.org/jira/browse/HDFS-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caozhiqiang updated HDFS-16456: --- Status: Patch Available (was: Open) > EC: Decommission a rack with only on dn will fail when the rack number is > equal with replication > > > Key: HDFS-16456 > URL: https://issues.apache.org/jira/browse/HDFS-16456 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Priority: Critical > Attachments: HDFS-16456.001.patch, HDFS-16456.002.patch, > HDFS-16456.003.patch, HDFS-16456.004.patch, HDFS-16456.005.patch, > HDFS-16456.006.patch, HDFS-16456.007.patch, HDFS-16456.008.patch > > > In below scenario, decommission will fail by TOO_MANY_NODES_ON_RACK reason: > # Enable EC policy, such as RS-6-3-1024k. > # The rack number in this cluster is equal with or less than the replication > number(9) > # A rack only has one DN, and decommission this DN. > The root cause is in > BlockPlacementPolicyRackFaultTolerant::getMaxNodesPerRack() function, it will > give a limit parameter maxNodesPerRack for choose targets. In this scenario, > the maxNodesPerRack is 1, which means each rack can only be chosen one > datanode. > {code:java} > protected int[] getMaxNodesPerRack(int numOfChosen, int numOfReplicas) { >... > // If more replicas than racks, evenly spread the replicas. > // This calculation rounds up. > int maxNodesPerRack = (totalNumOfReplicas - 1) / numOfRacks + 1; > return new int[] {numOfReplicas, maxNodesPerRack}; > } {code} > int maxNodesPerRack = (totalNumOfReplicas - 1) / numOfRacks + 1; > here will be called, where totalNumOfReplicas=9 and numOfRacks=9 > When we decommission one dn which is only one node in its rack, the > chooseOnce() in BlockPlacementPolicyRackFaultTolerant::chooseTargetInOrder() > will throw NotEnoughReplicasException, but the exception will not be caught > and fail to fallback to chooseEvenlyFromRemainingRacks() function. > When decommission, after choose targets, verifyBlockPlacement() function will > return the total rack number contains the invalid rack, and > BlockPlacementStatusDefault::isPlacementPolicySatisfied() will return false > and it will also cause decommission fail. > {code:java} > public BlockPlacementStatus verifyBlockPlacement(DatanodeInfo[] locs, > int numberOfReplicas) { > if (locs == null) > locs = DatanodeDescriptor.EMPTY_ARRAY; > if (!clusterMap.hasClusterEverBeenMultiRack()) { > // only one rack > return new BlockPlacementStatusDefault(1, 1, 1); > } > // Count locations on different racks. > Set racks = new HashSet<>(); > for (DatanodeInfo dn : locs) { > racks.add(dn.getNetworkLocation()); > } > return new BlockPlacementStatusDefault(racks.size(), numberOfReplicas, > clusterMap.getNumOfRacks()); > } {code} > {code:java} > public boolean isPlacementPolicySatisfied() { > return requiredRacks <= currentRacks || currentRacks >= totalRacks; > }{code} > According to the above description, we should make the below modify to fix it: > # In startDecommission() or stopDecommission(), we should also change the > numOfRacks in class NetworkTopology. Or choose targets may fail for the > maxNodesPerRack is too small. And even choose targets success, > isPlacementPolicySatisfied will also return false cause decommission fail. > # In BlockPlacementPolicyRackFaultTolerant::chooseTargetInOrder(), the first > chooseOnce() function should also be put in try..catch..., or it will not > fallback to call chooseEvenlyFromRemainingRacks() when throw exception. > # In verifyBlockPlacement, we need to remove invalid racks from total > numOfRacks, or isPlacementPolicySatisfied() will return false and cause fail > to reconstruct data. > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16456) EC: Decommission a rack with only on dn will fail when the rack number is equal with replication
[ https://issues.apache.org/jira/browse/HDFS-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caozhiqiang updated HDFS-16456: --- Attachment: HDFS-16456.008.patch > EC: Decommission a rack with only on dn will fail when the rack number is > equal with replication > > > Key: HDFS-16456 > URL: https://issues.apache.org/jira/browse/HDFS-16456 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Priority: Critical > Attachments: HDFS-16456.001.patch, HDFS-16456.002.patch, > HDFS-16456.003.patch, HDFS-16456.004.patch, HDFS-16456.005.patch, > HDFS-16456.006.patch, HDFS-16456.007.patch, HDFS-16456.008.patch > > > In below scenario, decommission will fail by TOO_MANY_NODES_ON_RACK reason: > # Enable EC policy, such as RS-6-3-1024k. > # The rack number in this cluster is equal with or less than the replication > number(9) > # A rack only has one DN, and decommission this DN. > The root cause is in > BlockPlacementPolicyRackFaultTolerant::getMaxNodesPerRack() function, it will > give a limit parameter maxNodesPerRack for choose targets. In this scenario, > the maxNodesPerRack is 1, which means each rack can only be chosen one > datanode. > {code:java} > protected int[] getMaxNodesPerRack(int numOfChosen, int numOfReplicas) { >... > // If more replicas than racks, evenly spread the replicas. > // This calculation rounds up. > int maxNodesPerRack = (totalNumOfReplicas - 1) / numOfRacks + 1; > return new int[] {numOfReplicas, maxNodesPerRack}; > } {code} > int maxNodesPerRack = (totalNumOfReplicas - 1) / numOfRacks + 1; > here will be called, where totalNumOfReplicas=9 and numOfRacks=9 > When we decommission one dn which is only one node in its rack, the > chooseOnce() in BlockPlacementPolicyRackFaultTolerant::chooseTargetInOrder() > will throw NotEnoughReplicasException, but the exception will not be caught > and fail to fallback to chooseEvenlyFromRemainingRacks() function. > When decommission, after choose targets, verifyBlockPlacement() function will > return the total rack number contains the invalid rack, and > BlockPlacementStatusDefault::isPlacementPolicySatisfied() will return false > and it will also cause decommission fail. > {code:java} > public BlockPlacementStatus verifyBlockPlacement(DatanodeInfo[] locs, > int numberOfReplicas) { > if (locs == null) > locs = DatanodeDescriptor.EMPTY_ARRAY; > if (!clusterMap.hasClusterEverBeenMultiRack()) { > // only one rack > return new BlockPlacementStatusDefault(1, 1, 1); > } > // Count locations on different racks. > Set racks = new HashSet<>(); > for (DatanodeInfo dn : locs) { > racks.add(dn.getNetworkLocation()); > } > return new BlockPlacementStatusDefault(racks.size(), numberOfReplicas, > clusterMap.getNumOfRacks()); > } {code} > {code:java} > public boolean isPlacementPolicySatisfied() { > return requiredRacks <= currentRacks || currentRacks >= totalRacks; > }{code} > According to the above description, we should make the below modify to fix it: > # In startDecommission() or stopDecommission(), we should also change the > numOfRacks in class NetworkTopology. Or choose targets may fail for the > maxNodesPerRack is too small. And even choose targets success, > isPlacementPolicySatisfied will also return false cause decommission fail. > # In BlockPlacementPolicyRackFaultTolerant::chooseTargetInOrder(), the first > chooseOnce() function should also be put in try..catch..., or it will not > fallback to call chooseEvenlyFromRemainingRacks() when throw exception. > # In verifyBlockPlacement, we need to remove invalid racks from total > numOfRacks, or isPlacementPolicySatisfied() will return false and cause fail > to reconstruct data. > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16456) EC: Decommission a rack with only on dn will fail when the rack number is equal with replication
[ https://issues.apache.org/jira/browse/HDFS-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caozhiqiang updated HDFS-16456: --- Status: Open (was: Patch Available) > EC: Decommission a rack with only on dn will fail when the rack number is > equal with replication > > > Key: HDFS-16456 > URL: https://issues.apache.org/jira/browse/HDFS-16456 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Priority: Critical > Attachments: HDFS-16456.001.patch, HDFS-16456.002.patch, > HDFS-16456.003.patch, HDFS-16456.004.patch, HDFS-16456.005.patch, > HDFS-16456.006.patch, HDFS-16456.007.patch, HDFS-16456.008.patch > > > In below scenario, decommission will fail by TOO_MANY_NODES_ON_RACK reason: > # Enable EC policy, such as RS-6-3-1024k. > # The rack number in this cluster is equal with or less than the replication > number(9) > # A rack only has one DN, and decommission this DN. > The root cause is in > BlockPlacementPolicyRackFaultTolerant::getMaxNodesPerRack() function, it will > give a limit parameter maxNodesPerRack for choose targets. In this scenario, > the maxNodesPerRack is 1, which means each rack can only be chosen one > datanode. > {code:java} > protected int[] getMaxNodesPerRack(int numOfChosen, int numOfReplicas) { >... > // If more replicas than racks, evenly spread the replicas. > // This calculation rounds up. > int maxNodesPerRack = (totalNumOfReplicas - 1) / numOfRacks + 1; > return new int[] {numOfReplicas, maxNodesPerRack}; > } {code} > int maxNodesPerRack = (totalNumOfReplicas - 1) / numOfRacks + 1; > here will be called, where totalNumOfReplicas=9 and numOfRacks=9 > When we decommission one dn which is only one node in its rack, the > chooseOnce() in BlockPlacementPolicyRackFaultTolerant::chooseTargetInOrder() > will throw NotEnoughReplicasException, but the exception will not be caught > and fail to fallback to chooseEvenlyFromRemainingRacks() function. > When decommission, after choose targets, verifyBlockPlacement() function will > return the total rack number contains the invalid rack, and > BlockPlacementStatusDefault::isPlacementPolicySatisfied() will return false > and it will also cause decommission fail. > {code:java} > public BlockPlacementStatus verifyBlockPlacement(DatanodeInfo[] locs, > int numberOfReplicas) { > if (locs == null) > locs = DatanodeDescriptor.EMPTY_ARRAY; > if (!clusterMap.hasClusterEverBeenMultiRack()) { > // only one rack > return new BlockPlacementStatusDefault(1, 1, 1); > } > // Count locations on different racks. > Set racks = new HashSet<>(); > for (DatanodeInfo dn : locs) { > racks.add(dn.getNetworkLocation()); > } > return new BlockPlacementStatusDefault(racks.size(), numberOfReplicas, > clusterMap.getNumOfRacks()); > } {code} > {code:java} > public boolean isPlacementPolicySatisfied() { > return requiredRacks <= currentRacks || currentRacks >= totalRacks; > }{code} > According to the above description, we should make the below modify to fix it: > # In startDecommission() or stopDecommission(), we should also change the > numOfRacks in class NetworkTopology. Or choose targets may fail for the > maxNodesPerRack is too small. And even choose targets success, > isPlacementPolicySatisfied will also return false cause decommission fail. > # In BlockPlacementPolicyRackFaultTolerant::chooseTargetInOrder(), the first > chooseOnce() function should also be put in try..catch..., or it will not > fallback to call chooseEvenlyFromRemainingRacks() when throw exception. > # In verifyBlockPlacement, we need to remove invalid racks from total > numOfRacks, or isPlacementPolicySatisfied() will return false and cause fail > to reconstruct data. > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16413) Reconfig dfs usage parameters for datanode
[ https://issues.apache.org/jira/browse/HDFS-16413?focusedWorklogId=748277&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-748277 ] ASF GitHub Bot logged work on HDFS-16413: - Author: ASF GitHub Bot Created on: 26/Mar/22 21:23 Start Date: 26/Mar/22 21:23 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3863: URL: https://github.com/apache/hadoop/pull/3863#issuecomment-1079777135 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 48s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 4 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 12m 35s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 26m 8s | | trunk passed | | +1 :green_heart: | compile | 24m 41s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 21m 13s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 3m 54s | | trunk passed | | +1 :green_heart: | mvnsite | 3m 18s | | trunk passed | | +1 :green_heart: | javadoc | 2m 17s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 3m 20s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 6m 2s | | trunk passed | | +1 :green_heart: | shadedclient | 26m 56s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 23s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 21s | | the patch passed | | +1 :green_heart: | compile | 24m 20s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 24m 20s | | the patch passed | | +1 :green_heart: | compile | 25m 41s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 25m 41s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 4m 5s | | the patch passed | | +1 :green_heart: | mvnsite | 3m 20s | | the patch passed | | +1 :green_heart: | javadoc | 2m 21s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 3m 28s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 6m 27s | | the patch passed | | +1 :green_heart: | shadedclient | 26m 58s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 17m 52s | | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 328m 5s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 0s | | The patch does not generate ASF License warnings. | | | | 575m 34s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3863/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3863 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 56aa0147a6de 4.15.0-163-generic #171-Ubuntu SMP Fri Nov 5 11:55:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 7fcb902f0effe1847cb8e816c2ce3c33e175ba38 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3863/4/testReport/ | | Max. process+thread count | 2046 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdf
[jira] [Work logged] (HDFS-16498) Fix NPE for checkBlockReportLease
[ https://issues.apache.org/jira/browse/HDFS-16498?focusedWorklogId=748250&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-748250 ] ASF GitHub Bot logged work on HDFS-16498: - Author: ASF GitHub Bot Created on: 26/Mar/22 19:00 Start Date: 26/Mar/22 19:00 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #4057: URL: https://github.com/apache/hadoop/pull/4057#issuecomment-1079755004 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 58s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 39m 50s | | trunk passed | | +1 :green_heart: | compile | 1m 44s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 35s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 0m 59s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 32s | | trunk passed | | +1 :green_heart: | javadoc | 1m 4s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 35s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 33s | | trunk passed | | +1 :green_heart: | shadedclient | 27m 12s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 35s | | the patch passed | | +1 :green_heart: | compile | 1m 39s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 1m 39s | | the patch passed | | +1 :green_heart: | compile | 1m 30s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 30s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 0s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 40s | | the patch passed | | +1 :green_heart: | javadoc | 1m 8s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 42s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 4m 1s | | the patch passed | | +1 :green_heart: | shadedclient | 29m 54s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 408m 45s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4057/7/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 41s | | The patch does not generate ASF License warnings. | | | | 530m 29s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4057/7/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4057 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 89d183d6eac2 4.15.0-166-generic #174-Ubuntu SMP Wed Dec 8 19:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 207024b30fdde9aca690efa1342bcb94ad3bfadd | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4057/7/testReport/ | | Max. process+thread count | 2120 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-proje
[jira] [Work logged] (HDFS-16477) [SPS]: Add metric PendingSPSPaths for getting the number of paths to be processed by SPS
[ https://issues.apache.org/jira/browse/HDFS-16477?focusedWorklogId=748249&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-748249 ] ASF GitHub Bot logged work on HDFS-16477: - Author: ASF GitHub Bot Created on: 26/Mar/22 18:55 Start Date: 26/Mar/22 18:55 Worklog Time Spent: 10m Work Description: ayushtkn commented on a change in pull request #4009: URL: https://github.com/apache/hadoop/pull/4009#discussion_r835797579 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/sps/TestExternalStoragePolicySatisfier.java ## @@ -441,6 +442,8 @@ private void doTestWhenStoragePolicySetToCOLD() throws Exception { hdfsCluster.triggerHeartbeats(); dfs.satisfyStoragePolicy(new Path(FILE)); +// Assert metrics. +assertEquals(1, hdfsCluster.getNamesystem().getPendingSPSPaths()); // Wait till namenode notified about the block location details DFSTestUtil.waitExpectedStorageType(FILE, StorageType.ARCHIVE, 3, 35000, dfs); Review comment: ``` // Wait till namenode notified about the block location details DFSTestUtil.waitExpectedStorageType(FILE, StorageType.ARCHIVE, 3, 35000, dfs); ``` Here you are waiting for SPS to process the path and move the blocks to the correct place, once this is done, whether ``getPendingSPSPaths`` will still return 1? I suppose no, right? the path got processed so the count should reduce to 0. So, my take is you don't have a control on `` DFSTestUtil.waitExpectedStorageType(FILE, StorageType.ARCHIVE, 3, 35000, dfs);``, if by chance SPS process that path before your assertion then the test will fail. I haven't gone through the code, but that is what I felt in my initial pass, if it doesn't work this way do lemme know -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 748249) Time Spent: 4h (was: 3h 50m) > [SPS]: Add metric PendingSPSPaths for getting the number of paths to be > processed by SPS > > > Key: HDFS-16477 > URL: https://issues.apache.org/jira/browse/HDFS-16477 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > > Currently we have no idea how many paths are waiting to be processed when > using the SPS feature. We should add metric PendingSPSPaths for getting the > number of paths to be processed by SPS in NameNode. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16521) DFS API to retrieve slow datanodes
[ https://issues.apache.org/jira/browse/HDFS-16521?focusedWorklogId=748245&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-748245 ] ASF GitHub Bot logged work on HDFS-16521: - Author: ASF GitHub Bot Created on: 26/Mar/22 18:50 Start Date: 26/Mar/22 18:50 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #4107: URL: https://github.com/apache/hadoop/pull/4107#issuecomment-1079753292 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 57s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | buf | 0m 1s | | buf was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 3 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 12m 26s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 26m 6s | | trunk passed | | +1 :green_heart: | compile | 6m 39s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 6m 2s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 15s | | trunk passed | | +1 :green_heart: | mvnsite | 3m 5s | | trunk passed | | +1 :green_heart: | javadoc | 2m 20s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 3m 6s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 7m 11s | | trunk passed | | +1 :green_heart: | shadedclient | 24m 6s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 23s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 55s | | the patch passed | | +1 :green_heart: | compile | 7m 44s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | cc | 7m 44s | | the patch passed | | -1 :x: | javac | 7m 44s | [/results-compile-javac-hadoop-hdfs-project-jdkUbuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4107/2/artifact/out/results-compile-javac-hadoop-hdfs-project-jdkUbuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04.txt) | hadoop-hdfs-project-jdkUbuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 generated 1 new + 651 unchanged - 0 fixed = 652 total (was 651) | | +1 :green_heart: | compile | 6m 29s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | cc | 6m 30s | | the patch passed | | -1 :x: | javac | 6m 29s | [/results-compile-javac-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4107/2/artifact/out/results-compile-javac-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt) | hadoop-hdfs-project-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 generated 1 new + 629 unchanged - 0 fixed = 630 total (was 629) | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 13s | | hadoop-hdfs-project: The patch generated 0 new + 456 unchanged - 1 fixed = 456 total (was 457) | | +1 :green_heart: | mvnsite | 3m 3s | | the patch passed | | +1 :green_heart: | javadoc | 2m 8s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 2m 49s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 7m 55s | | the patch passed | | +1 :green_heart: | shadedclient | 24m 32s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 18s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 405m 51s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4107/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | -1 :x: | unit
[jira] [Work logged] (HDFS-16523) Fix dependency error in hadoop-hdfs on M1 Mac
[ https://issues.apache.org/jira/browse/HDFS-16523?focusedWorklogId=748233&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-748233 ] ASF GitHub Bot logged work on HDFS-16523: - Author: ASF GitHub Bot Created on: 26/Mar/22 16:58 Start Date: 26/Mar/22 16:58 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #4112: URL: https://github.com/apache/hadoop/pull/4112#issuecomment-1079732693 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 57s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 33m 52s | | trunk passed | | +1 :green_heart: | compile | 0m 23s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 0m 23s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | mvnsite | 0m 27s | | trunk passed | | +1 :green_heart: | javadoc | 0m 25s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 0m 24s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | shadedclient | 56m 1s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 14s | | the patch passed | | +1 :green_heart: | compile | 0m 14s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 0m 14s | | the patch passed | | +1 :green_heart: | compile | 0m 14s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 0m 14s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | mvnsite | 0m 16s | | the patch passed | | +1 :green_heart: | xml | 0m 2s | | The patch has no ill-formed XML file. | | +1 :green_heart: | javadoc | 0m 14s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 0m 15s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | shadedclient | 20m 41s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 0m 18s | | hadoop-project in the patch passed. | | +1 :green_heart: | asflicense | 0m 36s | | The patch does not generate ASF License warnings. | | | | 81m 21s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4112/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4112 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell xml | | uname | Linux ddb11eab49bf 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 265c8ab15fde4032e38e4ef558cc2a3d931e6104 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4112/1/testReport/ | | Max. process+thread count | 700 (vs. ulimit of 5500) | | modules | C: hadoop-project U: hadoop-project | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4112/1/console | | versions | git=2.25.1 maven=3.6.3 | | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. -- This is an automated message from the Apache Git Service. To respond to the messag
[jira] [Work logged] (HDFS-16522) Set Http and Ipc ports for Datanodes in MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-16522?focusedWorklogId=748229&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-748229 ] ASF GitHub Bot logged work on HDFS-16522: - Author: ASF GitHub Bot Created on: 26/Mar/22 16:33 Start Date: 26/Mar/22 16:33 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #4108: URL: https://github.com/apache/hadoop/pull/4108#issuecomment-1079728094 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 38s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 7 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 12m 50s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 23m 28s | | trunk passed | | +1 :green_heart: | compile | 22m 50s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 19m 56s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 3m 35s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 31s | | trunk passed | | +1 :green_heart: | javadoc | 2m 1s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 2m 28s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 4m 24s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 54s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 28s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 37s | | the patch passed | | +1 :green_heart: | compile | 22m 2s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 22m 2s | | the patch passed | | +1 :green_heart: | compile | 20m 5s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 20m 5s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 3m 39s | | root: The patch generated 0 new + 198 unchanged - 1 fixed = 198 total (was 199) | | +1 :green_heart: | mvnsite | 2m 24s | | the patch passed | | +1 :green_heart: | javadoc | 1m 58s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 2m 30s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 4m 40s | | the patch passed | | +1 :green_heart: | shadedclient | 21m 10s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 238m 19s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | unit | 0m 47s | | hadoop-dynamometer-infra in the patch passed. | | +1 :green_heart: | asflicense | 1m 4s | | The patch does not generate ASF License warnings. | | | | 441m 44s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4108/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4108 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 4a1b12483ab0 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 7e38d9348ed3c9b8c0a0ce7c2a2bc063c9b5f5f6 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4108/2/testReport/ | | Max. process+thread count | 3248 (vs. ulimit of 5500) | | modu
[jira] [Work logged] (HDFS-16452) msync RPC should send to Acitve Namenode directly
[ https://issues.apache.org/jira/browse/HDFS-16452?focusedWorklogId=748228&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-748228 ] ASF GitHub Bot logged work on HDFS-16452: - Author: ASF GitHub Bot Created on: 26/Mar/22 15:44 Start Date: 26/Mar/22 15:44 Worklog Time Spent: 10m Work Description: hfutatzhanghb commented on pull request #3976: URL: https://github.com/apache/hadoop/pull/3976#issuecomment-1079719195 > hi, @xkrogen . So sorry for disturbing you. I have a new idea about what we have discussed. In practice, We usually plan some machines as Observer Namenode before setup cluster. Can we add a configuration entry in hdfs-site.xml to specify the nnid of Observer namenode ? After doing so, when we initialize failoverProxy, we can aovid adding observer namenodes to the proxies list. I am looking forward to your reply. Thx a lot. hi,@xkrogen, could you please help me look at this problem? thanks a lot. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 748228) Time Spent: 3h (was: 2h 50m) > msync RPC should send to Acitve Namenode directly > -- > > Key: HDFS-16452 > URL: https://issues.apache.org/jira/browse/HDFS-16452 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode >Affects Versions: 3.3.1 >Reporter: zhanghaobo >Priority: Minor > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > In current ObserverReadProxyProvider implementation, we use the following > code to invoke msync RPC. > {code:java} > getProxyAsClientProtocol(failoverProxy.getProxy().proxy).msync(); {code} > But msync RPC maybe send to Observer NameNode in this way, and then failover > to Active NameNode. This can be avoid by applying this patch. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16523) Fix dependency error in hadoop-hdfs on M1 Mac
[ https://issues.apache.org/jira/browse/HDFS-16523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16523: -- Labels: pull-request-available (was: ) > Fix dependency error in hadoop-hdfs on M1 Mac > - > > Key: HDFS-16523 > URL: https://issues.apache.org/jira/browse/HDFS-16523 > Project: Hadoop HDFS > Issue Type: Bug > Components: build > Environment: M1 Pro Mac >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > hadoop-hdfs build is failing on docker with M1 Mac. > {code} > [WARNING] > Dependency convergence error for > org.fusesource.hawtjni:hawtjni-runtime:jar:1.11:provided paths to > dependency are: > +-org.apache.hadoop:hadoop-hdfs:jar:3.4.0-SNAPSHOT > +-org.openlabtesting.leveldbjni:leveldbjni-all:jar:1.8:compile > +-org.openlabtesting.leveldbjni:leveldbjni:jar:1.8:provided > +-org.fusesource.hawtjni:hawtjni-runtime:jar:1.11:provided > and > +-org.apache.hadoop:hadoop-hdfs:jar:3.4.0-SNAPSHOT > +-org.openlabtesting.leveldbjni:leveldbjni-all:jar:1.8:compile > +-org.fusesource.leveldbjni:leveldbjni-osx:jar:1.8:provided > +-org.fusesource.leveldbjni:leveldbjni:jar:1.8:provided > +-org.fusesource.hawtjni:hawtjni-runtime:jar:1.9:provided > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16523) Fix dependency error in hadoop-hdfs on M1 Mac
[ https://issues.apache.org/jira/browse/HDFS-16523?focusedWorklogId=748227&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-748227 ] ASF GitHub Bot logged work on HDFS-16523: - Author: ASF GitHub Bot Created on: 26/Mar/22 15:35 Start Date: 26/Mar/22 15:35 Worklog Time Spent: 10m Work Description: aajisaka opened a new pull request #4112: URL: https://github.com/apache/hadoop/pull/4112 ### Description of PR Fix dependency error in hadoop-hdfs by fixing the version of hawtjni-runtime ### How was this patch tested? Manually tested. ### For code changes: - [x] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - n/a Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - n/a If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - n/a If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 748227) Remaining Estimate: 0h Time Spent: 10m > Fix dependency error in hadoop-hdfs on M1 Mac > - > > Key: HDFS-16523 > URL: https://issues.apache.org/jira/browse/HDFS-16523 > Project: Hadoop HDFS > Issue Type: Bug > Components: build > Environment: M1 Pro Mac >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > hadoop-hdfs build is failing on docker with M1 Mac. > {code} > [WARNING] > Dependency convergence error for > org.fusesource.hawtjni:hawtjni-runtime:jar:1.11:provided paths to > dependency are: > +-org.apache.hadoop:hadoop-hdfs:jar:3.4.0-SNAPSHOT > +-org.openlabtesting.leveldbjni:leveldbjni-all:jar:1.8:compile > +-org.openlabtesting.leveldbjni:leveldbjni:jar:1.8:provided > +-org.fusesource.hawtjni:hawtjni-runtime:jar:1.11:provided > and > +-org.apache.hadoop:hadoop-hdfs:jar:3.4.0-SNAPSHOT > +-org.openlabtesting.leveldbjni:leveldbjni-all:jar:1.8:compile > +-org.fusesource.leveldbjni:leveldbjni-osx:jar:1.8:provided > +-org.fusesource.leveldbjni:leveldbjni:jar:1.8:provided > +-org.fusesource.hawtjni:hawtjni-runtime:jar:1.9:provided > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16523) Fix dependency error in hadoop-hdfs on M1 Mac
[ https://issues.apache.org/jira/browse/HDFS-16523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-16523: - Summary: Fix dependency error in hadoop-hdfs on M1 Mac (was: Fix dependency error in hadoop-hdfs) > Fix dependency error in hadoop-hdfs on M1 Mac > - > > Key: HDFS-16523 > URL: https://issues.apache.org/jira/browse/HDFS-16523 > Project: Hadoop HDFS > Issue Type: Bug > Components: build > Environment: M1 Pro Mac >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > > hadoop-hdfs build is failing on docker with M1 Mac. > {code} > [WARNING] > Dependency convergence error for > org.fusesource.hawtjni:hawtjni-runtime:jar:1.11:provided paths to > dependency are: > +-org.apache.hadoop:hadoop-hdfs:jar:3.4.0-SNAPSHOT > +-org.openlabtesting.leveldbjni:leveldbjni-all:jar:1.8:compile > +-org.openlabtesting.leveldbjni:leveldbjni:jar:1.8:provided > +-org.fusesource.hawtjni:hawtjni-runtime:jar:1.11:provided > and > +-org.apache.hadoop:hadoop-hdfs:jar:3.4.0-SNAPSHOT > +-org.openlabtesting.leveldbjni:leveldbjni-all:jar:1.8:compile > +-org.fusesource.leveldbjni:leveldbjni-osx:jar:1.8:provided > +-org.fusesource.leveldbjni:leveldbjni:jar:1.8:provided > +-org.fusesource.hawtjni:hawtjni-runtime:jar:1.9:provided > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16523) Fix dependency error in hadoop-hdfs
Akira Ajisaka created HDFS-16523: Summary: Fix dependency error in hadoop-hdfs Key: HDFS-16523 URL: https://issues.apache.org/jira/browse/HDFS-16523 Project: Hadoop HDFS Issue Type: Bug Components: build Environment: M1 Pro Mac Reporter: Akira Ajisaka Assignee: Akira Ajisaka hadoop-hdfs build is failing on docker with M1 Mac. {code} [WARNING] Dependency convergence error for org.fusesource.hawtjni:hawtjni-runtime:jar:1.11:provided paths to dependency are: +-org.apache.hadoop:hadoop-hdfs:jar:3.4.0-SNAPSHOT +-org.openlabtesting.leveldbjni:leveldbjni-all:jar:1.8:compile +-org.openlabtesting.leveldbjni:leveldbjni:jar:1.8:provided +-org.fusesource.hawtjni:hawtjni-runtime:jar:1.11:provided and +-org.apache.hadoop:hadoop-hdfs:jar:3.4.0-SNAPSHOT +-org.openlabtesting.leveldbjni:leveldbjni-all:jar:1.8:compile +-org.fusesource.leveldbjni:leveldbjni-osx:jar:1.8:provided +-org.fusesource.leveldbjni:leveldbjni:jar:1.8:provided +-org.fusesource.hawtjni:hawtjni-runtime:jar:1.9:provided {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16477) [SPS]: Add metric PendingSPSPaths for getting the number of paths to be processed by SPS
[ https://issues.apache.org/jira/browse/HDFS-16477?focusedWorklogId=748187&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-748187 ] ASF GitHub Bot logged work on HDFS-16477: - Author: ASF GitHub Bot Created on: 26/Mar/22 10:33 Start Date: 26/Mar/22 10:33 Worklog Time Spent: 10m Work Description: tomscut commented on a change in pull request #4009: URL: https://github.com/apache/hadoop/pull/4009#discussion_r835748939 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/sps/TestExternalStoragePolicySatisfier.java ## @@ -441,6 +442,8 @@ private void doTestWhenStoragePolicySetToCOLD() throws Exception { hdfsCluster.triggerHeartbeats(); dfs.satisfyStoragePolicy(new Path(FILE)); +// Assert metrics. +assertEquals(1, hdfsCluster.getNamesystem().getPendingSPSPaths()); // Wait till namenode notified about the block location details DFSTestUtil.waitExpectedStorageType(FILE, StorageType.ARCHIVE, 3, 35000, dfs); Review comment: > Is there a race condition possible, if the storage policy gets satisfied before we get the pending sps paths? in that case the assertion shall fail I suppose? Thanks for your review. The method doTestWhenStoragePolicySetToCOLD is only called in one place. I think there should be no race condition? What do you think of this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 748187) Time Spent: 3h 50m (was: 3h 40m) > [SPS]: Add metric PendingSPSPaths for getting the number of paths to be > processed by SPS > > > Key: HDFS-16477 > URL: https://issues.apache.org/jira/browse/HDFS-16477 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Time Spent: 3h 50m > Remaining Estimate: 0h > > Currently we have no idea how many paths are waiting to be processed when > using the SPS feature. We should add metric PendingSPSPaths for getting the > number of paths to be processed by SPS in NameNode. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16477) [SPS]: Add metric PendingSPSPaths for getting the number of paths to be processed by SPS
[ https://issues.apache.org/jira/browse/HDFS-16477?focusedWorklogId=748186&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-748186 ] ASF GitHub Bot logged work on HDFS-16477: - Author: ASF GitHub Bot Created on: 26/Mar/22 10:33 Start Date: 26/Mar/22 10:33 Worklog Time Spent: 10m Work Description: tomscut commented on a change in pull request #4009: URL: https://github.com/apache/hadoop/pull/4009#discussion_r835748863 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/sps/TestExternalStoragePolicySatisfier.java ## @@ -441,6 +442,8 @@ private void doTestWhenStoragePolicySetToCOLD() throws Exception { hdfsCluster.triggerHeartbeats(); dfs.satisfyStoragePolicy(new Path(FILE)); +// Assert metrics. +assertEquals(1, hdfsCluster.getNamesystem().getPendingSPSPaths()); // Wait till namenode notified about the block location details DFSTestUtil.waitExpectedStorageType(FILE, StorageType.ARCHIVE, 3, 35000, dfs); Review comment: Thanks for your review. The method `doTestWhenStoragePolicySetToCOLD` is only called in one place. I think there should be no race condition? What do you think of this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 748186) Time Spent: 3h 40m (was: 3.5h) > [SPS]: Add metric PendingSPSPaths for getting the number of paths to be > processed by SPS > > > Key: HDFS-16477 > URL: https://issues.apache.org/jira/browse/HDFS-16477 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > > Currently we have no idea how many paths are waiting to be processed when > using the SPS feature. We should add metric PendingSPSPaths for getting the > number of paths to be processed by SPS in NameNode. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16498) Fix NPE for checkBlockReportLease
[ https://issues.apache.org/jira/browse/HDFS-16498?focusedWorklogId=748183&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-748183 ] ASF GitHub Bot logged work on HDFS-16498: - Author: ASF GitHub Bot Created on: 26/Mar/22 10:04 Start Date: 26/Mar/22 10:04 Worklog Time Spent: 10m Work Description: tomscut commented on a change in pull request #4057: URL: https://github.com/apache/hadoop/pull/4057#discussion_r835746423 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java ## @@ -2751,6 +2751,13 @@ public boolean checkBlockReportLease(BlockReportContext context, return true; } DatanodeDescriptor node = datanodeManager.getDatanode(nodeID); +if (node == null) { + final UnregisteredNodeException e = new UnregisteredNodeException(nodeID, null); + NameNode.stateChangeLog.error("BLOCK* NameSystem.getDatanode: " + "Data node " + nodeID + Review comment: > Yeps, But if @Hexiaoqiao wants to change the log in the upper method to log. I am good with that as well Thank you very much for your comments. I will update the code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 748183) Time Spent: 4.5h (was: 4h 20m) > Fix NPE for checkBlockReportLease > - > > Key: HDFS-16498 > URL: https://issues.apache.org/jira/browse/HDFS-16498 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Attachments: image-2022-03-09-20-35-22-028.png, screenshot-1.png > > Time Spent: 4.5h > Remaining Estimate: 0h > > During the restart of Namenode, a Datanode is not registered, but this > Datanode triggers FBR, which causes NPE. > !image-2022-03-09-20-35-22-028.png|width=871,height=158! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16477) [SPS]: Add metric PendingSPSPaths for getting the number of paths to be processed by SPS
[ https://issues.apache.org/jira/browse/HDFS-16477?focusedWorklogId=748182&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-748182 ] ASF GitHub Bot logged work on HDFS-16477: - Author: ASF GitHub Bot Created on: 26/Mar/22 10:02 Start Date: 26/Mar/22 10:02 Worklog Time Spent: 10m Work Description: tomscut commented on a change in pull request #4009: URL: https://github.com/apache/hadoop/pull/4009#discussion_r835746197 ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/resolver/NamenodeStatusReport.java ## @@ -382,7 +384,8 @@ public void setNamesystemInfo(long available, long total, this.numOfBlocksPendingDeletion = numBlocksPendingDeletion; this.numOfFiles = numFiles; this.statsValid = true; -this.providedSpace = providedSpace; +this.providedSpace = providedStorageSpace; Review comment: This is to fix [checkstyles](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4009/5/artifact/out/results-checkstyle-root.txt). I don't know if I need to update the checkstyles of the old code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 748182) Time Spent: 3.5h (was: 3h 20m) > [SPS]: Add metric PendingSPSPaths for getting the number of paths to be > processed by SPS > > > Key: HDFS-16477 > URL: https://issues.apache.org/jira/browse/HDFS-16477 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Time Spent: 3.5h > Remaining Estimate: 0h > > Currently we have no idea how many paths are waiting to be processed when > using the SPS feature. We should add metric PendingSPSPaths for getting the > number of paths to be processed by SPS in NameNode. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16498) Fix NPE for checkBlockReportLease
[ https://issues.apache.org/jira/browse/HDFS-16498?focusedWorklogId=748181&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-748181 ] ASF GitHub Bot logged work on HDFS-16498: - Author: ASF GitHub Bot Created on: 26/Mar/22 09:58 Start Date: 26/Mar/22 09:58 Worklog Time Spent: 10m Work Description: ayushtkn commented on a change in pull request #4057: URL: https://github.com/apache/hadoop/pull/4057#discussion_r835745918 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java ## @@ -2751,6 +2751,13 @@ public boolean checkBlockReportLease(BlockReportContext context, return true; } DatanodeDescriptor node = datanodeManager.getDatanode(nodeID); +if (node == null) { + final UnregisteredNodeException e = new UnregisteredNodeException(nodeID, null); + NameNode.stateChangeLog.error("BLOCK* NameSystem.getDatanode: " + "Data node " + nodeID + Review comment: Yeps, But if @Hexiaoqiao wants to change the log in the upper method to log. I am good with that as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 748181) Time Spent: 4h 20m (was: 4h 10m) > Fix NPE for checkBlockReportLease > - > > Key: HDFS-16498 > URL: https://issues.apache.org/jira/browse/HDFS-16498 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Attachments: image-2022-03-09-20-35-22-028.png, screenshot-1.png > > Time Spent: 4h 20m > Remaining Estimate: 0h > > During the restart of Namenode, a Datanode is not registered, but this > Datanode triggers FBR, which causes NPE. > !image-2022-03-09-20-35-22-028.png|width=871,height=158! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16498) Fix NPE for checkBlockReportLease
[ https://issues.apache.org/jira/browse/HDFS-16498?focusedWorklogId=748179&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-748179 ] ASF GitHub Bot logged work on HDFS-16498: - Author: ASF GitHub Bot Created on: 26/Mar/22 09:57 Start Date: 26/Mar/22 09:57 Worklog Time Spent: 10m Work Description: tomscut commented on a change in pull request #4057: URL: https://github.com/apache/hadoop/pull/4057#discussion_r835745725 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java ## @@ -2751,6 +2751,13 @@ public boolean checkBlockReportLease(BlockReportContext context, return true; } DatanodeDescriptor node = datanodeManager.getDatanode(nodeID); +if (node == null) { + final UnregisteredNodeException e = new UnregisteredNodeException(nodeID, null); + NameNode.stateChangeLog.error("BLOCK* NameSystem.getDatanode: " + "Data node " + nodeID + Review comment: > @tomscut you can remove this exception from here, considering it is being logged in the upper method. > > > Although it is DEBU level now, we could improve that to WARN, right? > > I think debug is also fine, this doesn't denote any problematic stuff, this can happen in normal scenario as well and the datanode will register subsequently Hi @ayushtkn, do you mean I just remove this log? ``` NameNode.stateChangeLog.error("BLOCK* NameSystem.getDatanode: " + "Data node " + nodeID + " is attempting to report storage ID " + nodeID.getDatanodeUuid() + ". But this node is not registered."); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 748179) Time Spent: 4h 10m (was: 4h) > Fix NPE for checkBlockReportLease > - > > Key: HDFS-16498 > URL: https://issues.apache.org/jira/browse/HDFS-16498 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Attachments: image-2022-03-09-20-35-22-028.png, screenshot-1.png > > Time Spent: 4h 10m > Remaining Estimate: 0h > > During the restart of Namenode, a Datanode is not registered, but this > Datanode triggers FBR, which causes NPE. > !image-2022-03-09-20-35-22-028.png|width=871,height=158! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16498) Fix NPE for checkBlockReportLease
[ https://issues.apache.org/jira/browse/HDFS-16498?focusedWorklogId=748178&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-748178 ] ASF GitHub Bot logged work on HDFS-16498: - Author: ASF GitHub Bot Created on: 26/Mar/22 09:56 Start Date: 26/Mar/22 09:56 Worklog Time Spent: 10m Work Description: tomscut commented on a change in pull request #4057: URL: https://github.com/apache/hadoop/pull/4057#discussion_r835745725 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java ## @@ -2751,6 +2751,13 @@ public boolean checkBlockReportLease(BlockReportContext context, return true; } DatanodeDescriptor node = datanodeManager.getDatanode(nodeID); +if (node == null) { + final UnregisteredNodeException e = new UnregisteredNodeException(nodeID, null); + NameNode.stateChangeLog.error("BLOCK* NameSystem.getDatanode: " + "Data node " + nodeID + Review comment: > @tomscut you can remove this exception from here, considering it is being logged in the upper method. > > > Although it is DEBU level now, we could improve that to WARN, right? > > I think debug is also fine, this doesn't denote any problematic stuff, this can happen in normal scenario as well and the datanode will register subsequently Hi @ayushtkn, do you mean I just remove this log? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 748178) Time Spent: 4h (was: 3h 50m) > Fix NPE for checkBlockReportLease > - > > Key: HDFS-16498 > URL: https://issues.apache.org/jira/browse/HDFS-16498 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Attachments: image-2022-03-09-20-35-22-028.png, screenshot-1.png > > Time Spent: 4h > Remaining Estimate: 0h > > During the restart of Namenode, a Datanode is not registered, but this > Datanode triggers FBR, which causes NPE. > !image-2022-03-09-20-35-22-028.png|width=871,height=158! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16477) [SPS]: Add metric PendingSPSPaths for getting the number of paths to be processed by SPS
[ https://issues.apache.org/jira/browse/HDFS-16477?focusedWorklogId=748177&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-748177 ] ASF GitHub Bot logged work on HDFS-16477: - Author: ASF GitHub Bot Created on: 26/Mar/22 09:48 Start Date: 26/Mar/22 09:48 Worklog Time Spent: 10m Work Description: ayushtkn commented on a change in pull request #4009: URL: https://github.com/apache/hadoop/pull/4009#discussion_r835744910 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/sps/TestExternalStoragePolicySatisfier.java ## @@ -441,6 +442,8 @@ private void doTestWhenStoragePolicySetToCOLD() throws Exception { hdfsCluster.triggerHeartbeats(); dfs.satisfyStoragePolicy(new Path(FILE)); +// Assert metrics. +assertEquals(1, hdfsCluster.getNamesystem().getPendingSPSPaths()); // Wait till namenode notified about the block location details DFSTestUtil.waitExpectedStorageType(FILE, StorageType.ARCHIVE, 3, 35000, dfs); Review comment: Is there a race condition possible, if the storage policy gets satisfied before we get the pending sps paths? in that case the assertion shall fail I suppose? ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/resolver/NamenodeStatusReport.java ## @@ -382,7 +384,8 @@ public void setNamesystemInfo(long available, long total, this.numOfBlocksPendingDeletion = numBlocksPendingDeletion; this.numOfFiles = numFiles; this.statsValid = true; -this.providedSpace = providedSpace; +this.providedSpace = providedStorageSpace; Review comment: if this is just formatting change, please avoid this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 748177) Time Spent: 3h 20m (was: 3h 10m) > [SPS]: Add metric PendingSPSPaths for getting the number of paths to be > processed by SPS > > > Key: HDFS-16477 > URL: https://issues.apache.org/jira/browse/HDFS-16477 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Time Spent: 3h 20m > Remaining Estimate: 0h > > Currently we have no idea how many paths are waiting to be processed when > using the SPS feature. We should add metric PendingSPSPaths for getting the > number of paths to be processed by SPS in NameNode. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16511) Change some frequent method lock type in ReplicaMap.
[ https://issues.apache.org/jira/browse/HDFS-16511?focusedWorklogId=748176&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-748176 ] ASF GitHub Bot logged work on HDFS-16511: - Author: ASF GitHub Bot Created on: 26/Mar/22 09:42 Start Date: 26/Mar/22 09:42 Worklog Time Spent: 10m Work Description: tomscut commented on a change in pull request #4085: URL: https://github.com/apache/hadoop/pull/4085#discussion_r835744670 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java ## @@ -602,6 +606,54 @@ public void run() {} + "volumeMap.", 0, totalNumReplicas); } + @Test(timeout = 3) + public void testConcurrentWriteAndDeleteBlock() throws Exception { +// Feed FsDataset with block metadata. +final int numBlocks = 1000; +final int threadCount = 10; +// Generate data blocks. +ExecutorService pool = Executors.newFixedThreadPool(threadCount); +List> futureList = new ArrayList<>(); +Random random = new Random(); +// Random write block and delete half of them. +for (int i = 0; i < threadCount; i++) { + Thread thread = new Thread() { +@Override +public void run() { + try { +String bpid = BLOCK_POOL_IDS[random.nextInt(BLOCK_POOL_IDS.length)]; +for (int blockId = 0; blockId < numBlocks; blockId++) { + ExtendedBlock eb = new ExtendedBlock(bpid, blockId); + ReplicaHandler replica = null; + try { +replica = dataset.createRbw(StorageType.DEFAULT, null, eb, +false); +if (blockId % 2 > 0) { + dataset.invalidate(bpid, new Block[]{eb.getLocalBlock()}); +} + } finally { +if (replica != null) { + replica.close(); +} + } +} + } catch (Exception e) { +e.printStackTrace(); Review comment: Sorry, I forgot to say could you change this to log4j? The other change looks good to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 748176) Time Spent: 1.5h (was: 1h 20m) > Change some frequent method lock type in ReplicaMap. > > > Key: HDFS-16511 > URL: https://issues.apache.org/jira/browse/HDFS-16511 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Mingxiang Li >Assignee: Mingxiang Li >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > In HDFS-16429 we make LightWeightResizableGSet to be thread safe, and In > HDFS-15382 we have split lock to block pool grain locks.After these > improvement, we can change some method to acquire read lock replace to > acquire write lock. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16498) Fix NPE for checkBlockReportLease
[ https://issues.apache.org/jira/browse/HDFS-16498?focusedWorklogId=748175&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-748175 ] ASF GitHub Bot logged work on HDFS-16498: - Author: ASF GitHub Bot Created on: 26/Mar/22 09:40 Start Date: 26/Mar/22 09:40 Worklog Time Spent: 10m Work Description: ayushtkn commented on a change in pull request #4057: URL: https://github.com/apache/hadoop/pull/4057#discussion_r835744469 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java ## @@ -2751,6 +2751,13 @@ public boolean checkBlockReportLease(BlockReportContext context, return true; } DatanodeDescriptor node = datanodeManager.getDatanode(nodeID); +if (node == null) { + final UnregisteredNodeException e = new UnregisteredNodeException(nodeID, null); + NameNode.stateChangeLog.error("BLOCK* NameSystem.getDatanode: " + "Data node " + nodeID + Review comment: @tomscut you can remove this exception from here, considering it is being logged in the upper method. >Although it is DEBU level now, we could improve that to WARN, right? I think debug is also fine, this doesn't denote any problematic stuff, this can happen in normal scenario as well and the datanode will register subsequently -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 748175) Time Spent: 3h 50m (was: 3h 40m) > Fix NPE for checkBlockReportLease > - > > Key: HDFS-16498 > URL: https://issues.apache.org/jira/browse/HDFS-16498 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Attachments: image-2022-03-09-20-35-22-028.png, screenshot-1.png > > Time Spent: 3h 50m > Remaining Estimate: 0h > > During the restart of Namenode, a Datanode is not registered, but this > Datanode triggers FBR, which causes NPE. > !image-2022-03-09-20-35-22-028.png|width=871,height=158! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16522) Set Http and Ipc ports for Datanodes in MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-16522?focusedWorklogId=748173&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-748173 ] ASF GitHub Bot logged work on HDFS-16522: - Author: ASF GitHub Bot Created on: 26/Mar/22 09:22 Start Date: 26/Mar/22 09:22 Worklog Time Spent: 10m Work Description: virajjasani commented on pull request #4108: URL: https://github.com/apache/hadoop/pull/4108#issuecomment-1079647650 @ayushtkn Could you please review this PR? It's quite similar to #4028 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 748173) Time Spent: 0.5h (was: 20m) > Set Http and Ipc ports for Datanodes in MiniDFSCluster > -- > > Key: HDFS-16522 > URL: https://issues.apache.org/jira/browse/HDFS-16522 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > We should provide options to set Http and Ipc ports for Datanodes in > MiniDFSCluster. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16522) Set Http and Ipc ports for Datanodes in MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-16522?focusedWorklogId=748169&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-748169 ] ASF GitHub Bot logged work on HDFS-16522: - Author: ASF GitHub Bot Created on: 26/Mar/22 09:09 Start Date: 26/Mar/22 09:09 Worklog Time Spent: 10m Work Description: virajjasani commented on pull request #4108: URL: https://github.com/apache/hadoop/pull/4108#issuecomment-1079645768 hadoop-dynamometer-infra failure is not relevant because the mvn test doesn't seem to be running after compiling hadoop-hdfs module from this patch. Running the same in local: ``` $ mvn -Pparallel-tests -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.zstd -Drequire.test.libhadoop -Pyarn-ui clean test -fae [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.tools.dynamometer.TestDynamometerInfra [WARNING] Tests run: 1, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.004 s - in org.apache.hadoop.tools.dynamometer.TestDynamometerInfra [INFO] Running org.apache.hadoop.tools.dynamometer.TestDynoInfraUtils [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.109 s - in org.apache.hadoop.tools.dynamometer.TestDynoInfraUtils [INFO] [INFO] Results: [INFO] [WARNING] Tests run: 3, Failures: 0, Errors: 0, Skipped: 1 [INFO] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 5.325 s [INFO] Finished at: 2022-03-26T14:38:43+05:30 [INFO] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 748169) Time Spent: 20m (was: 10m) > Set Http and Ipc ports for Datanodes in MiniDFSCluster > -- > > Key: HDFS-16522 > URL: https://issues.apache.org/jira/browse/HDFS-16522 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > We should provide options to set Http and Ipc ports for Datanodes in > MiniDFSCluster. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org