[ https://issues.apache.org/jira/browse/HBASE-23764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nick Dimiduk updated HBASE-23764: --------------------------------- Issue Type: Test (was: Bug) > Flaky tests due to ZK client name resolution delays > --------------------------------------------------- > > Key: HBASE-23764 > URL: https://issues.apache.org/jira/browse/HBASE-23764 > Project: HBase > Issue Type: Test > Components: test > Affects Versions: 3.0.0 > Reporter: Bharath Vissapragada > Assignee: Bharath Vissapragada > Priority: Major > Fix For: 3.0.0, 2.3.0 > > Attachments: sample-jstacks.zip > > > [~ndimiduk] and I ran into this issue (separately) and we noticed that there > are some performance issues with name resolution in the Zookeeper client. > Since we use ZK heavily in the unit tests, this often manifests as the > following issues > 1. Test time outs starting the mini cluster (Master failed to start....) > 2. InterruptedException (because the tests timeout) > 3. Flaky tests because a subset of the cluster fails to start for whatever > reason (replication tests especially because they spawn multiple clusters). > 4. ConnectionLoss to znode /hbase/xyzz.. JVM pause? > I have strong feeling that this is a possible cause for many of our flaky > tests in Jenkins. Luckily, it looks like the following workaround to switch > to an IP address instead of hostname makes it much quicker. There are some > related discussions in the ZK community (ZOOKEEPER-1666 and related jiras). > {code:java} > --- a/hbase-common/src/main/resources/hbase-default.xml > +++ b/hbase-common/src/main/resources/hbase-default.xml > @@ -72,7 +72,7 @@ possible configurations would overwhelm and obscure the > important. > </property> > <property> > <name>hbase.zookeeper.quorum</name> > - <value>localhost</value> > + <value>127.0.0.1</value> > <description>Comma separated list of servers in the ZooKeeper ensemble > (This config. should have been named hbase.zookeeper.ensemble). > For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com". > {code} > Until we figure out the actual root cause and a dependency upgrade (if > needed), we should consider making this hostname to IP switch for more stable > builds. -- This message was sent by Atlassian Jira (v8.3.4#803005)