Re: [PR] KAFKA-16452: Bound high-watermark offset to range between LLSO and LEO [kafka]


chia7712 commented on PR #15634:
URL: https://github.com/apache/kafka/pull/15634#issuecomment-2063435567

> HWM is set to to localLogStartOffset in
[UnifiedLog#updateLocalLogStartOffset](https://sourcegraph.com/github.com/apache/kafka@f895ab5145077c5efa10a4a898628d901b01e2c2/-/blob/core/src/main/scala/kafka/log/UnifiedLog.scala?L162),
then we load the HWM from the checkpoint file in
[Partition#createLog](https://sourcegraph.com/github.com/apache/kafka@f895ab5145077c5efa10a4a898628d901b01e2c2/-/blob/core/src/main/scala/kafka/cluster/Partition.scala?L495).
If the HWM checkpoint file is missing / does not contain the entry for
partition, then the default value of 0 is taken. If 0 < LogStartOffset (LSO),
then LSO is assumed as HWM . Thus, the non-monotonic update of highwatermark
from LLSO to LSO can happen.

Pardon me. I'm a bit confused about this. Please feel free to correct me to
help me catch up :smile:

### case 0: the checkpoint file is missing and the remote storage is
**disabled**
The LSO is initialized to LLSO

https://github.com/apache/kafka/blob/aee9724ee15ed539ae73c09cc2c2eda83ae3c864/core/src/main/scala/kafka/log/LogLoader.scala#L180

so I can't understand why the non-monotonic update happens? After all, LLSO
and LSO are the same in this scenario.

### case 1: the checkpoint file is missing and the remote storage is
**enabled**
The LSO is initialzied to `logStartOffsetCheckpoint` which is 0 since there
are no checkpoint files.

https://github.com/apache/kafka/blob/aee9724ee15ed539ae73c09cc2c2eda83ae3c864/core/src/main/scala/kafka/log/LogLoader.scala#L178

And then HWM will be update to LLSO which is larger than zero.

https://github.com/apache/kafka/blob/aee9724ee15ed539ae73c09cc2c2eda83ae3c864/core/src/main/scala/kafka/log/UnifiedLog.scala#L172

And this could be a problem when
[Partition#createLog](https://sourcegraph.com/github.com/apache/kafka@f895ab5145077c5efa10a4a898628d901b01e2c2/-/blob/core/src/main/scala/kafka/cluster/Partition.scala?L495)
get called since the HWM is changed from LLSO (non-zero) to LSO (zero). Also,
the incorrect HWM causes error in `convertToOffsetMetadataOrThrow`.

If I understand correctly, it seems the root cause is that "when the
checkpoint files are not working, we will initialize a `UnifiedLog` with
incorrect LSO".

and so could we fix that by re-build `logStartOffsets` according remote
storage when checkpoint is not working
(https://github.com/apache/kafka/blob/aee9724ee15ed539ae73c09cc2c2eda83ae3c864/core/src/main/scala/kafka/log/LogManager.scala#L459)?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to