[ https://issues.apache.org/jira/browse/HBASE-16367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412933#comment-15412933 ]
stack commented on HBASE-16367: ------------------------------- What is this? It adds: 821 if (this.initLatch != null) { 822 this.initLatch.await(50, TimeUnit.SECONDS); 823 } ...which causes a new findbugs.... reported above but ignored: Return value of java.util.concurrent.CountDownLatch.await(long, TimeUnit) ignored in org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper() At HRegionServer.java:ignored in org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper() At HRegionServer.java:[line 822] We then wait on the latch 50 seconds and then just proceed? What is supposed to be the startup scenario here? How does the latch ensure a particular path? > Race between master and region server initialization may lead to premature > server abort > --------------------------------------------------------------------------------------- > > Key: HBASE-16367 > URL: https://issues.apache.org/jira/browse/HBASE-16367 > Project: HBase > Issue Type: Bug > Affects Versions: 1.1.2 > Reporter: Ted Yu > Assignee: Ted Yu > Fix For: 2.0.0, 1.4.0 > > Attachments: 16367.addendum, 16367.v1.txt, 16367.v2.txt, > 16367.v3.txt, 63908-master.log > > > I was troubleshooting a case where hbase (1.1.2) master always dies shortly > after start - see attached master log snippet. > It turned out that master initialization thread was racing with > HRegionServer#preRegistrationInitialization() (initializeZooKeeper, actually) > since HMaster extends HRegionServer. > Through additional logging in master: > {code} > this.oldLogDir = createInitialFileSystemLayout(); > HFileSystem.addLocationsOrderInterceptor(conf); > LOG.info("creating splitLogManager"); > {code} > I found that execution didn't reach the last log line before region server > declared cluster Id being null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)