[jira] [Commented] (HDFS-6166) revisit balancer so_timeout

Hadoop QA (JIRA) Fri, 28 Mar 2014 19:05:25 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951705#comment-13951705
 ]


Hadoop QA commented on HDFS-6166:
---------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12637273/HDFS-6166.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
                        Please justify why no new tests are needed for this 
patch.
                        Also please list what manual steps were performed to 
verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.namenode.ha.TestHAAppend

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6550//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6550//console

This message is automatically generated.

> revisit balancer so_timeout 
> ----------------------------
>
>                 Key: HDFS-6166
>                 URL: https://issues.apache.org/jira/browse/HDFS-6166
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer
>    Affects Versions: 3.0.0, 2.3.0
>            Reporter: Nathan Roberts
>            Assignee: Nathan Roberts
>            Priority: Blocker
>         Attachments: HDFS-6166.patch
>
>
> HDFS-5806 changed the socket read timeout for the balancer connection to DN 
> to 60 seconds. This works as long as balancer bandwidth is such that it's 
> safe to assume that the DN will easily complete the operation within this 
> time. Obviously this isn't a good assumption. When this assumption isn't 
> valid, the balancer will timeout the cmd BUT it will then be out-of-sync with 
> the datanode (balancer thinks the DN has room to do more work, DN is still 
> working on the request and will fail any subsequent requests with "threads 
> quota exceeded errors"). This causes expensive NN traffic via getBlocks() and 
> also causes lots of WARNS int the balancer log.
> Unfortunately the protocol is such that it's impossible to tell if the DN is 
> busy working on replacing the block, OR is in bad shape and will never finish.
> So, in the interest of a small change to deal with both situations, I propose 
> the following two changes:
> * Crank of the socket read timeout to 20 minutes
> * Delay looking at a node for a bit if we did timeout in this way (the DN 
> could still have xceiver threads working on the replace 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6166) revisit balancer so_timeout

Reply via email to