Chia-Ping Tsai created HBASE-19624:
--------------------------------------

             Summary: TestIOFencing hangs
                 Key: HBASE-19624
                 URL: https://issues.apache.org/jira/browse/HBASE-19624
             Project: HBase
          Issue Type: Bug
            Reporter: Chia-Ping Tsai
            Assignee: Chia-Ping Tsai
             Fix For: 2.0.0


RS calls CompactSplit#join to cease all compactSplit threads.
{code:title=CompactSplit.java}
  private void waitFor(ThreadPoolExecutor t, String name) {
    boolean done = false;
    while (!done) {
      try {
        done = t.awaitTermination(60, TimeUnit.SECONDS);
        LOG.info("Waiting for " + name + " to finish...");
        if (!done) {
          t.shutdownNow();
        }
      } catch (InterruptedException ie) {
        LOG.warn("Interrupted waiting for " + name + " to finish...");
      }
    }
  }
{code}
In the meantime, the async wal may wait for the sync signal. However, the 
single won't happen as the wal sync is failed.
{code}
  synchronized long get(long timeoutNs) throws InterruptedException,
      ExecutionException, TimeoutIOException {
    final long done = System.nanoTime() + timeoutNs;
    while (!isDone()) {
      wait(1000);
      if (System.nanoTime() >= done) {
        throw new TimeoutIOException(
            "Failed to get sync result after " + 
TimeUnit.NANOSECONDS.toMillis(timeoutNs)
                + " ms for txid=" + this.txid + ", WAL system stuck?");
      }
    }
    if (this.throwable != null) {
      throw new ExecutionException(this.throwable);
    }
    return this.doneTxid;
  }
{code}

When we shutdown the mini cluster, JVMClusterUtil#shutdown sends the interrupt 
single to all rs threads. And then catching the InterruptedException cause 
compactionsplit to skip the #shutdownNow, hence the compactionsplit threads 
were up until timeout (default is 5 min).   
{code}
      for (int i = 0; i < 100; ++i) {
        boolean atLeastOneLiveServer = false;
        for (RegionServerThread t : regionservers) {
          if (t.isAlive()) {
            atLeastOneLiveServer = true;
            try {
              LOG.warn("RegionServerThreads remaining, give one more chance 
before interrupting");
              t.join(1000);
            } catch (InterruptedException e) {
              wasInterrupted = true;
            }
          }
        }
        if (!atLeastOneLiveServer) break;
        for (RegionServerThread t : regionservers) {
          if (t.isAlive()) {
            LOG.warn("RegionServerThreads taking too long to stop, 
interrupting");
            t.interrupt();
          }
        }
      }
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to