[jira] [Commented] (HDFS-14579) In refreshNodes, avoid performing a DNS lookup while holding the write lock

Hadoop QA (JIRA) Tue, 18 Jun 2019 08:03:13 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866679#comment-16866679
 ]


Hadoop QA commented on HDFS-14579:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 43s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m 
13s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 45s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}140m 42s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs |
|  |  
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.refreshDatanodes()
 makes inefficient use of keySet iterator instead of entrySet iterator  At 
DatanodeManager.java:keySet iterator instead of entrySet iterator  At 
DatanodeManager.java:[line 1230] |
| Failed junit tests | 
hadoop.hdfs.server.namenode.TestNameNodeRespectsBindHostKeys |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | HDFS-14579 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12972088/HDFS-14579.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 3135dfa0f5dd 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 
13 15:00:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 335c1c9 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HDFS-Build/26992/artifact/out/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/26992/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/26992/testReport/ |
| Max. process+thread count | 3410 (vs. ulimit of 10000) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/26992/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> In refreshNodes, avoid performing a DNS lookup while holding the write lock
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-14579
>                 URL: https://issues.apache.org/jira/browse/HDFS-14579
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.3.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>         Attachments: HDFS-14579.001.patch
>
>
> When refreshNodes is called on a large cluster, or a cluster where DNS is not 
> performing well, it can cause the namenode to hang for a long time. This is 
> because the refreshNodes operation holds the global write lock while it is 
> running. Most of refreshNodes code is simple and hence fast, but 
> unfortunately it performs a DNS lookup for each host in the cluster while the 
> lock is held. 
> Right now, it calls:
> {code}
>   public void refreshNodes(final Configuration conf) throws IOException {
>     refreshHostsReader(conf);
>     namesystem.writeLock();
>     try {
>       refreshDatanodes();
>       countSoftwareVersions();
>     } finally {
>       namesystem.writeUnlock();
>     }
>   }
> {code}
> The line refreshHostsReader(conf); reads the new config file and does a DNS 
> lookup on each entry - the write lock is not held here. Then the main work is 
> done here:
> {code}
>   private void refreshDatanodes() {
>     final Map<String, DatanodeDescriptor> copy;
>     synchronized (this) {
>       copy = new HashMap<>(datanodeMap);
>     }
>     for (DatanodeDescriptor node : copy.values()) {
>       // Check if not include.
>       if (!hostConfigManager.isIncluded(node)) {
>         node.setDisallowed(true);
>       } else {
>         long maintenanceExpireTimeInMS =
>             hostConfigManager.getMaintenanceExpirationTimeInMS(node);
>         if (node.maintenanceNotExpired(maintenanceExpireTimeInMS)) {
>           datanodeAdminManager.startMaintenance(
>               node, maintenanceExpireTimeInMS);
>         } else if (hostConfigManager.isExcluded(node)) {
>           datanodeAdminManager.startDecommission(node);
>         } else {
>           datanodeAdminManager.stopMaintenance(node);
>           datanodeAdminManager.stopDecommission(node);
>         }
>       }
>       node.setUpgradeDomain(hostConfigManager.getUpgradeDomain(node));
>     }
>   }
> {code}
> All the isIncluded(), isExcluded() methods call node.getResolvedAddress() 
> which does the DNS lookup. We could probably change things to perform all the 
> DNS lookups outside of the write lock, and then take the lock and process the 
> nodes. Also change or overload isIncluded() etc to take the inetAddress 
> rather than the datanode descriptor.
> It would not shorten the time the operation takes to run overall, but it 
> would move the long duration out of the write lock and avoid blocking the 
> namenode for the entire time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14579) In refreshNodes, avoid performing a DNS lookup while holding the write lock

Reply via email to