José Armando García Sancio created KAFKA-20016:
--------------------------------------------------

             Summary: Wait until HWM is known before deleting snapshots
                 Key: KAFKA-20016
                 URL: https://issues.apache.org/jira/browse/KAFKA-20016
             Project: Kafka
          Issue Type: Bug
          Components: kraft
            Reporter: José Armando García Sancio
            Assignee: José Armando García Sancio


If a kraft replica stays offline for a while it is possible to see the 
following error:
{code:java}
org.apache.kafka.common.errors.OffsetOutOfRangeException: Cannot increment the 
log start offset to 126032410 of partition __cluster_metadata-0 since it is 
larger than the high watermark 126013494{code}
This happens because the snapshot cleaning code will execute before the HWM is 
known. The method RaftMetadataLogCleanerManager::maybeClean doesn't check the 
HWM before calling KafkaRaftLog::maybeClean.

Since the HWM is not known, the UnifiedLog will update the HWM to the oldest 
snapshot when that snapshot is deleted and the log start offset is updated:
{code:java}
      private void updateLogStartOffset(long offset) throws IOException {
          logStartOffset = offset;
          if (highWatermark() < offset) {
              updateHighWatermark(offset);
          }
          if (localLog.recoveryPoint() < offset) {
              localLog.updateRecoveryPoint(offset);
          }
      } {code}
When the next snapshot is deleted the following check will fail:
{code:java}
       public boolean maybeIncrementLogStartOffset(long newLogStartOffset, 
LogStartOffsetIncrementReason reason) {
...
          return maybeHandleIOException(
                  () -> "Exception while increasing log start offset for " + 
topicPartition() + " to " + newLogStartOffset + " in dir " + dir().getParent(),
                  () -> {
                      synchronized (lock)  {
                          if (newLogStartOffset > highWatermark()) {
                              throw new OffsetOutOfRangeException("Cannot 
increment the log start offset to " + newLogStartOffset + " of partition " + 
topicPartition() +
                                      " since it is larger than the high 
watermark " + highWatermark());
                          }
...{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to