[ 
https://issues.apache.org/jira/browse/HBASE-23764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Vissapragada resolved HBASE-23764.
------------------------------------------
    Fix Version/s: 2.3.0
                   3.0.0
       Resolution: Fixed

Thanks for the review [~stack]

> Flaky tests due to ZK client name resolution delays
> ---------------------------------------------------
>
>                 Key: HBASE-23764
>                 URL: https://issues.apache.org/jira/browse/HBASE-23764
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 3.0.0
>            Reporter: Bharath Vissapragada
>            Assignee: Bharath Vissapragada
>            Priority: Major
>             Fix For: 3.0.0, 2.3.0
>
>         Attachments: sample-jstacks.zip
>
>
> [~ndimiduk] and I ran into this issue (separately) and we noticed that there 
> are some performance issues with name resolution in the Zookeeper client. 
> Since we use ZK heavily in the unit tests, this often manifests as the 
> following issues 
> 1. Test time outs starting the mini cluster (Master failed to start....)
> 2. InterruptedException (because the tests timeout)
> 3. Flaky tests because a subset of the cluster fails to start for whatever 
> reason (replication tests especially because they spawn multiple clusters).
> 4. ConnectionLoss to znode /hbase/xyzz.. JVM pause?
> I have strong feeling that this is a possible cause for many of our flaky 
> tests in Jenkins. Luckily, it looks like the following workaround to switch 
> to an IP address instead of hostname makes it much quicker. There are some 
> related discussions in the ZK community (ZOOKEEPER-1666 and related jiras).
> {code:java}
> --- a/hbase-common/src/main/resources/hbase-default.xml
> +++ b/hbase-common/src/main/resources/hbase-default.xml
> @@ -72,7 +72,7 @@ possible configurations would overwhelm and obscure the 
> important.
>    </property>
>    <property>
>      <name>hbase.zookeeper.quorum</name>
> -    <value>localhost</value>
> +    <value>127.0.0.1</value>
>      <description>Comma separated list of servers in the ZooKeeper ensemble
>      (This config. should have been named hbase.zookeeper.ensemble).
>      For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
> {code}
> Until we figure out the actual root cause and a dependency upgrade (if 
> needed), we should consider making this hostname to IP switch for more stable 
> builds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to