chia7712 commented on PR #15634:
URL: https://github.com/apache/kafka/pull/15634#issuecomment-2063435567

   > HWM is set to to localLogStartOffset in 
[UnifiedLog#updateLocalLogStartOffset](https://sourcegraph.com/github.com/apache/kafka@f895ab5145077c5efa10a4a898628d901b01e2c2/-/blob/core/src/main/scala/kafka/log/UnifiedLog.scala?L162),
 then we load the HWM from the checkpoint file in 
[Partition#createLog](https://sourcegraph.com/github.com/apache/kafka@f895ab5145077c5efa10a4a898628d901b01e2c2/-/blob/core/src/main/scala/kafka/cluster/Partition.scala?L495).
   If the HWM checkpoint file is missing / does not contain the entry for 
partition, then the default value of 0 is taken. If 0 < LogStartOffset (LSO), 
then LSO is assumed as HWM . Thus, the non-monotonic update of highwatermark 
from LLSO to LSO can happen.
   
   Pardon me. I'm a bit confused about this. Please feel free to correct me to 
help me catch up :smile: 
   
   ### case 0: the checkpoint file is missing and the remote storage is 
**disabled**
   The LSO is initialized to LLSO
   
   
https://github.com/apache/kafka/blob/aee9724ee15ed539ae73c09cc2c2eda83ae3c864/core/src/main/scala/kafka/log/LogLoader.scala#L180
   
   so I can't understand why the non-monotonic update happens? After all, LLSO 
and LSO are the same in this scenario.
   
   ### case 1: the checkpoint file is missing and the remote storage is 
**enabled**
   The LSO is initialzied to `logStartOffsetCheckpoint` which is 0 since there 
are no checkpoint files.
   
   
https://github.com/apache/kafka/blob/aee9724ee15ed539ae73c09cc2c2eda83ae3c864/core/src/main/scala/kafka/log/LogLoader.scala#L178
   
   And then HWM will be update to LLSO which is larger than zero.
   
   
https://github.com/apache/kafka/blob/aee9724ee15ed539ae73c09cc2c2eda83ae3c864/core/src/main/scala/kafka/log/UnifiedLog.scala#L172
   
   And this could be a problem when 
[Partition#createLog](https://sourcegraph.com/github.com/apache/kafka@f895ab5145077c5efa10a4a898628d901b01e2c2/-/blob/core/src/main/scala/kafka/cluster/Partition.scala?L495)
 get called since the HWM is changed from LLSO (non-zero) to LSO (zero). Also, 
the incorrect HWM causes error in `convertToOffsetMetadataOrThrow`.
   
   If I understand correctly, it seems the root cause is that "when the 
checkpoint files are not working, we will initialize a `UnifiedLog` with 
incorrect LSO". 
   
   and so could we fix that by re-build `logStartOffsets` according remote 
storage when checkpoint is not working 
(https://github.com/apache/kafka/blob/aee9724ee15ed539ae73c09cc2c2eda83ae3c864/core/src/main/scala/kafka/log/LogManager.scala#L459)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to