[ https://issues.apache.org/jira/browse/HBASE-28665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955808#comment-17955808 ]
Viraj Jasani commented on HBASE-28665: -------------------------------------- Yes, master/branch-3 are not affected IIRC. Correct [~kiran.maturi]? > WALs not marked closed when there are errors in closing WALs > ------------------------------------------------------------ > > Key: HBASE-28665 > URL: https://issues.apache.org/jira/browse/HBASE-28665 > Project: HBase > Issue Type: Bug > Components: wal > Affects Versions: 2.5.8 > Reporter: Kiran Kumar Maturi > Assignee: Kiran Kumar Maturi > Priority: Minor > Labels: pull-request-available > Fix For: 2.7.0, 2.6.1, 2.5.10 > > > In our production clusters we have observed that when WAL close fails It > causes the the oldWAL files not marked as close and not letting them cleaned. > When a WAL close fails in closeWriter it increments the error count. > {code:java} > Span span = Span.current(); > try { > span.addEvent("closing writer"); > writer.close(); > span.addEvent("writer closed"); > } catch (IOException ioe) { > int errors = closeErrorCount.incrementAndGet(); > boolean hasUnflushedEntries = isUnflushedEntries(); > if (syncCloseCall && (hasUnflushedEntries || (errors > > this.closeErrorsTolerated))) { > LOG.error("Close of WAL " + path + " failed. Cause=\"" + > ioe.getMessage() + "\", errors=" > + errors + ", hasUnflushedEntries=" + hasUnflushedEntries); > throw ioe; > } > LOG.warn("Riding over failed WAL close of " + path > + "; THIS FILE WAS NOT CLOSED BUT ALL EDITS SYNCED SO SHOULD BE OK", > ioe); > } > {code} > When there are errors in closing WAL only twice doReplaceWALWriter enters > this code block > {code:java} > if (isUnflushedEntries() || closeErrorCount.get() >= > this.closeErrorsTolerated) { > try { > closeWriter(this.writer, oldPath, true); > } finally { > inflightWALClosures.remove(oldPath.getName()); > } > } > {code} > as we don't mark them closed here like we do it here > > {code:java} > Writer localWriter = this.writer; > closeExecutor.execute(() -> { > try { > closeWriter(localWriter, oldPath, false); > } catch (IOException e) { > LOG.warn("close old writer failed", e); > } finally { > // call this even if the above close fails, as there is no > other chance we can set > // closed to true, it will not cause big problems. > {color:red} markClosedAndClean(oldPath);{color} > inflightWALClosures.remove(oldPath.getName()); > } > }); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)