José Armando García Sancio created KAFKA-20016:
--------------------------------------------------
Summary: Wait until HWM is known before deleting snapshots
Key: KAFKA-20016
URL: https://issues.apache.org/jira/browse/KAFKA-20016
Project: Kafka
Issue Type: Bug
Components: kraft
Reporter: José Armando García Sancio
Assignee: José Armando García Sancio
If a kraft replica stays offline for a while it is possible to see the
following error:
{code:java}
org.apache.kafka.common.errors.OffsetOutOfRangeException: Cannot increment the
log start offset to 126032410 of partition __cluster_metadata-0 since it is
larger than the high watermark 126013494{code}
This happens because the snapshot cleaning code will execute before the HWM is
known. The method RaftMetadataLogCleanerManager::maybeClean doesn't check the
HWM before calling KafkaRaftLog::maybeClean.
Since the HWM is not known, the UnifiedLog will update the HWM to the oldest
snapshot when that snapshot is deleted and the log start offset is updated:
{code:java}
private void updateLogStartOffset(long offset) throws IOException {
logStartOffset = offset;
if (highWatermark() < offset) {
updateHighWatermark(offset);
}
if (localLog.recoveryPoint() < offset) {
localLog.updateRecoveryPoint(offset);
}
} {code}
When the next snapshot is deleted the following check will fail:
{code:java}
public boolean maybeIncrementLogStartOffset(long newLogStartOffset,
LogStartOffsetIncrementReason reason) {
...
return maybeHandleIOException(
() -> "Exception while increasing log start offset for " +
topicPartition() + " to " + newLogStartOffset + " in dir " + dir().getParent(),
() -> {
synchronized (lock) {
if (newLogStartOffset > highWatermark()) {
throw new OffsetOutOfRangeException("Cannot
increment the log start offset to " + newLogStartOffset + " of partition " +
topicPartition() +
" since it is larger than the high
watermark " + highWatermark());
}
...{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)