[jira] [Commented] (HDFS-11384) Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike

Zhe Zhang (JIRA) Mon, 24 Apr 2017 23:04:03 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982410#comment-15982410
 ]


Zhe Zhang commented on HDFS-11384:
----------------------------------

Thanks for the update [~shv]. Now all other tests in {{TestBalancer}} pass 
except for {{testBalancerRPCDelay}}:
{code}
java.util.concurrent.TimeoutException: Timed out waiting for /tmp.txt to reach 
40 replicas

        at 
org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:764)
        at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.createFile(TestBalancer.java:306)
        at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:847)
        at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerRPCDelay(TestBalancer.java:2071)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}

> Add option for balancer to disperse getBlocks calls to avoid NameNode's 
> rpc.CallQueueLength spike
> -------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-11384
>                 URL: https://issues.apache.org/jira/browse/HDFS-11384
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer & mover
>    Affects Versions: 2.7.3
>            Reporter: yunjiong zhao
>            Assignee: Konstantin Shvachko
>         Attachments: balancer.day.png, balancer.week.png, 
> HDFS-11384.001.patch, HDFS-11384.002.patch, HDFS-11384.003.patch, 
> HDFS-11384.004.patch, HDFS-11384.005.patch, HDFS-11384.006.patch, 
> HDFS-11384-007.patch, HDFS-11384.008.patch
>
>
> When running balancer on hadoop cluster which have more than 3000 Datanodes 
> will cause NameNode's rpc.CallQueueLength spike. We observed this situation 
> could cause Hbase cluster failure due to RegionServer's WAL timeout.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11384) Add option for balancer to disperse getBlocks calls to avoid NameNode's rpc.CallQueueLength spike

Reply via email to