Kiran Kumar Maturi created HBASE-28665:
------------------------------------------

             Summary: WALs not marked closed when there are errors in closing 
WALs
                 Key: HBASE-28665
                 URL: https://issues.apache.org/jira/browse/HBASE-28665
             Project: HBase
          Issue Type: Bug
          Components: wal
    Affects Versions: 2.5.8
            Reporter: Kiran Kumar Maturi
            Assignee: Kiran Kumar Maturi


In our production clusters we have observed that when WAL close fails It causes 
the the oldWAL files not marked as close and not letting them cleaned. When a 
WAL close fails in closeWriter it increments the error count. 

{code:java}
Span span = Span.current();
 try {
      span.addEvent("closing writer");
      writer.close();
      span.addEvent("writer closed");
    } catch (IOException ioe) {
      int errors = closeErrorCount.incrementAndGet();
      boolean hasUnflushedEntries = isUnflushedEntries();
      if (syncCloseCall && (hasUnflushedEntries || (errors > 
this.closeErrorsTolerated))) {
        LOG.error("Close of WAL " + path + " failed. Cause=\"" + 
ioe.getMessage() + "\", errors="
          + errors + ", hasUnflushedEntries=" + hasUnflushedEntries);
        throw ioe;
      }
      LOG.warn("Riding over failed WAL close of " + path
        + "; THIS FILE WAS NOT CLOSED BUT ALL EDITS SYNCED SO SHOULD BE OK", 
ioe);
    }
{code}

When there are errors in closing WAL only twice doReplaceWALWriter enters this 
code block

{code:java}
if (isUnflushedEntries() || closeErrorCount.get() >= this.closeErrorsTolerated) 
{
          try {
            closeWriter(this.writer, oldPath, true);
          } finally {
            inflightWALClosures.remove(oldPath.getName());
          }
        }
{code}
 as we don't mark them closed here like we do it here 
  
{code:java}
  Writer localWriter = this.writer;
          closeExecutor.execute(() -> {
            try {
              closeWriter(localWriter, oldPath, false);
            } catch (IOException e) {
              LOG.warn("close old writer failed", e);
            } finally {
              // call this even if the above close fails, as there is no other 
chance we can set
              // closed to true, it will not cause big problems.
             {color:red} markClosedAndClean(oldPath);{color}
              inflightWALClosures.remove(oldPath.getName());
            }
          });
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to