Chia-Ping Tsai created HBASE-19624: -------------------------------------- Summary: TestIOFencing hangs Key: HBASE-19624 URL: https://issues.apache.org/jira/browse/HBASE-19624 Project: HBase Issue Type: Bug Reporter: Chia-Ping Tsai Assignee: Chia-Ping Tsai Fix For: 2.0.0
RS calls CompactSplit#join to cease all compactSplit threads. {code:title=CompactSplit.java} private void waitFor(ThreadPoolExecutor t, String name) { boolean done = false; while (!done) { try { done = t.awaitTermination(60, TimeUnit.SECONDS); LOG.info("Waiting for " + name + " to finish..."); if (!done) { t.shutdownNow(); } } catch (InterruptedException ie) { LOG.warn("Interrupted waiting for " + name + " to finish..."); } } } {code} In the meantime, the async wal may wait for the sync signal. However, the single won't happen as the wal sync is failed. {code} synchronized long get(long timeoutNs) throws InterruptedException, ExecutionException, TimeoutIOException { final long done = System.nanoTime() + timeoutNs; while (!isDone()) { wait(1000); if (System.nanoTime() >= done) { throw new TimeoutIOException( "Failed to get sync result after " + TimeUnit.NANOSECONDS.toMillis(timeoutNs) + " ms for txid=" + this.txid + ", WAL system stuck?"); } } if (this.throwable != null) { throw new ExecutionException(this.throwable); } return this.doneTxid; } {code} When we shutdown the mini cluster, JVMClusterUtil#shutdown sends the interrupt single to all rs threads. And then catching the InterruptedException cause compactionsplit to skip the #shutdownNow, hence the compactionsplit threads were up until timeout (default is 5 min). {code} for (int i = 0; i < 100; ++i) { boolean atLeastOneLiveServer = false; for (RegionServerThread t : regionservers) { if (t.isAlive()) { atLeastOneLiveServer = true; try { LOG.warn("RegionServerThreads remaining, give one more chance before interrupting"); t.join(1000); } catch (InterruptedException e) { wasInterrupted = true; } } } if (!atLeastOneLiveServer) break; for (RegionServerThread t : regionservers) { if (t.isAlive()) { LOG.warn("RegionServerThreads taking too long to stop, interrupting"); t.interrupt(); } } } {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)