Kiran Kumar Maturi created HBASE-28801: ------------------------------------------
Summary: WALs remain unclosed for long Key: HBASE-28801 URL: https://issues.apache.org/jira/browse/HBASE-28801 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.5.6 Reporter: Kiran Kumar Maturi In our production fleet we have observed that WAL files are not cleaned up even when all the entries have been flushed. I have fixed the WAL close issue when there is issue with the wal closures as part of [HBASE-28665|https://issues.apache.org/jira/projects/HBASE/issues/HBASE-28665]. There is a case in case of unflushed entried that can lead for the wal not be cleaned even after all the entries have been flushed [FSHLog.java |https://github.com/apache/hbase/blob/branch-2.6/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java#L388] {code:java} if (isUnflushedEntries() || closeErrorCount.get() >= this.closeErrorsTolerated) { try { closeWriter(this.writer, oldPath, true); } finally { inflightWALClosures.remove(oldPath.getName()); if (!isUnflushedEntries()) { markClosedAndClean(oldPath); } } {code} If there are unflushed entries then wal will never be marked close and won't be cleaned further {code:java} private synchronized void cleanOldLogs() { List<Pair<Path, Long>> logsToArchive = null; // For each log file, look at its Map of regions to the highest sequence id; if all sequence ids // are older than what is currently in memory, the WAL can be GC'd. for (Map.Entry<Path, WALProps> e : this.walFile2Props.entrySet()) { if (!e.getValue().closed) { LOG.debug("{} is not closed yet, will try archiving it next time", e.getKey()); continue; } Path log = e.getKey(); Map<byte[], Long> sequenceNums = e.getValue().encodedName2HighestSequenceId; if (this.sequenceIdAccounting.areAllLower(sequenceNums)) { if (logsToArchive == null) { logsToArchive = new ArrayList<>(); } logsToArchive.add(Pair.newPair(log, e.getValue().logSize)); if (LOG.isTraceEnabled()) { LOG.trace("WAL file ready for archiving " + log); } } } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)