[ 
https://issues.apache.org/jira/browse/HBASE-28665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955808#comment-17955808
 ] 

Viraj Jasani commented on HBASE-28665:
--------------------------------------

Yes, master/branch-3 are not affected IIRC. Correct [~kiran.maturi]?

> WALs not marked closed when there are errors in closing WALs
> ------------------------------------------------------------
>
>                 Key: HBASE-28665
>                 URL: https://issues.apache.org/jira/browse/HBASE-28665
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 2.5.8
>            Reporter: Kiran Kumar Maturi
>            Assignee: Kiran Kumar Maturi
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 2.7.0, 2.6.1, 2.5.10
>
>
> In our production clusters we have observed that when WAL close fails It 
> causes the the oldWAL files not marked as close and not letting them cleaned. 
> When a WAL close fails in closeWriter it increments the error count. 
> {code:java}
> Span span = Span.current();
>  try {
>       span.addEvent("closing writer");
>       writer.close();
>       span.addEvent("writer closed");
>     } catch (IOException ioe) {
>       int errors = closeErrorCount.incrementAndGet();
>       boolean hasUnflushedEntries = isUnflushedEntries();
>       if (syncCloseCall && (hasUnflushedEntries || (errors > 
> this.closeErrorsTolerated))) {
>         LOG.error("Close of WAL " + path + " failed. Cause=\"" + 
> ioe.getMessage() + "\", errors="
>           + errors + ", hasUnflushedEntries=" + hasUnflushedEntries);
>         throw ioe;
>       }
>       LOG.warn("Riding over failed WAL close of " + path
>         + "; THIS FILE WAS NOT CLOSED BUT ALL EDITS SYNCED SO SHOULD BE OK", 
> ioe);
>     }
> {code}
> When there are errors in closing WAL only twice doReplaceWALWriter enters 
> this code block
> {code:java}
> if (isUnflushedEntries() || closeErrorCount.get() >= 
> this.closeErrorsTolerated) {
>           try {
>             closeWriter(this.writer, oldPath, true);
>           } finally {
>             inflightWALClosures.remove(oldPath.getName());
>           }
>         }
> {code}
>  as we don't mark them closed here like we do it here 
>   
> {code:java}
>   Writer localWriter = this.writer;
>           closeExecutor.execute(() -> {
>             try {
>               closeWriter(localWriter, oldPath, false);
>             } catch (IOException e) {
>               LOG.warn("close old writer failed", e);
>             } finally {
>               // call this even if the above close fails, as there is no 
> other chance we can set
>               // closed to true, it will not cause big problems.
>              {color:red} markClosedAndClean(oldPath);{color}
>               inflightWALClosures.remove(oldPath.getName());
>             }
>           });
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to