[
https://issues.apache.org/jira/browse/HBASE-28665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Jasani resolved HBASE-28665.
----------------------------------
Fix Version/s: 2.7.0
2.6.1
2.5.10
Hadoop Flags: Reviewed
Resolution: Fixed
> WALs not marked closed when there are errors in closing WALs
> ------------------------------------------------------------
>
> Key: HBASE-28665
> URL: https://issues.apache.org/jira/browse/HBASE-28665
> Project: HBase
> Issue Type: Bug
> Components: wal
> Affects Versions: 2.5.8
> Reporter: Kiran Kumar Maturi
> Assignee: Kiran Kumar Maturi
> Priority: Minor
> Labels: pull-request-available
> Fix For: 2.7.0, 2.6.1, 2.5.10
>
>
> In our production clusters we have observed that when WAL close fails It
> causes the the oldWAL files not marked as close and not letting them cleaned.
> When a WAL close fails in closeWriter it increments the error count.
> {code:java}
> Span span = Span.current();
> try {
> span.addEvent("closing writer");
> writer.close();
> span.addEvent("writer closed");
> } catch (IOException ioe) {
> int errors = closeErrorCount.incrementAndGet();
> boolean hasUnflushedEntries = isUnflushedEntries();
> if (syncCloseCall && (hasUnflushedEntries || (errors >
> this.closeErrorsTolerated))) {
> LOG.error("Close of WAL " + path + " failed. Cause=\"" +
> ioe.getMessage() + "\", errors="
> + errors + ", hasUnflushedEntries=" + hasUnflushedEntries);
> throw ioe;
> }
> LOG.warn("Riding over failed WAL close of " + path
> + "; THIS FILE WAS NOT CLOSED BUT ALL EDITS SYNCED SO SHOULD BE OK",
> ioe);
> }
> {code}
> When there are errors in closing WAL only twice doReplaceWALWriter enters
> this code block
> {code:java}
> if (isUnflushedEntries() || closeErrorCount.get() >=
> this.closeErrorsTolerated) {
> try {
> closeWriter(this.writer, oldPath, true);
> } finally {
> inflightWALClosures.remove(oldPath.getName());
> }
> }
> {code}
> as we don't mark them closed here like we do it here
>
> {code:java}
> Writer localWriter = this.writer;
> closeExecutor.execute(() -> {
> try {
> closeWriter(localWriter, oldPath, false);
> } catch (IOException e) {
> LOG.warn("close old writer failed", e);
> } finally {
> // call this even if the above close fails, as there is no
> other chance we can set
> // closed to true, it will not cause big problems.
> {color:red} markClosedAndClean(oldPath);{color}
> inflightWALClosures.remove(oldPath.getName());
> }
> });
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)