kowshik opened a new pull request #10280:
URL: https://github.com/apache/kafka/pull/10280


   **This PR is a work in progress. Please do not treat this as a finished 
product yet.**
   
   **TL;DR:**
   
   This PR implements the details of the Log layer refactor, as outlined in 
this document: 
https://docs.google.com/document/d/1dQJL4MCwqQJSPmZkVmVzshFZKuFy_bCPtubav4wBfHQ/edit.
 Few details maybe different from the doc, but it is more or less the same.
   
   **STRATEGY:**
   
   In this PR, I've extracted a new class called `LocalLog` out of `Log`.  
Currently `LocalLog` is purely an implementation detail thats not exposed 
outside `Log` class (except for tests). The object encapsulation is that each 
`Log` instance  wraps around a `LocalLog` instance.
   
   This new `LocalLog` class attempts to encompass **most** of the 
responsibilities of local log surrounding the `segments` map, which otherwise 
were present in `Log` previously. Note that **not** all local log 
responsibilities have been moved over to this new class (yet). The criteria I 
used was to preserve (for now) in existing `Log` class, any logic that is 
mingled  in a complex manner with the `logStartOffset` or the 
`LeaderEpochCache` or the `ProducerStateManager`.
   
   **WINS:**
   
   The main win is that the new `LocalLog` class is now agnostic of the 
`logStartOffset`, which continues to be managed mainly by `Log` class. Below is 
the local log functionality that has successfully moved over from `Log` to 
`LocalLog`:
   
   1. The `ConcurrentNavigableMap` containing map of of offset -> segment, and 
the access logic surrounding it.
   2. Read path logic to read records from the log.
   3. Segment file deletion logic.
   4. Segment recovery.
   5. Segment truncation.
   6. Segment roll.
   
   **BLOCKERS:**
   
   The API of `LocalLog` is probably not crisp at this point. Below is the main 
local log functionality that continues to remain in `Log` due to blockers. The 
reason is that the below logic is mingled with one or more of the following: 
`logStartOffset` or `LeaderEpochCache` or `ProducerStateManager`. This makes it 
hard to separate just the local logic out of it:
   
   1. Segment append.
   2. Last stable offset and logic surrounding it.
   3. High watermark and logic surrounding it.
   4. Logic to `fetchOffsetByTimestamp` and logic to `legacyFetchOffsetsBefore`.
   5. Some of the retention logic thats related with the global view of the log.
   6. All other logic related with handling  `LeaderEpochCache` and 
`ProducerStateManager`.
   
   **PAINPOINTS:**
   
   1. Producer state manager instance needed to be passed explicitly into the 
roll API to capture snapshots before roll.
   2. Log locking semantics had to be changed in handful of areas, with lock 
taken at a coarse level (ex: recovery).
   3. Few APIs needed re-ordering of logic in `Log` class to make migration 
feasible.
   4. Certain APIs added to `LocalLog` are crude in nature or signature, 
examples: `def checkIfMemoryMappedBufferClosed`, `def markFlushed`, `def 
updateRecoveryPoint`, `def replaceSegments` etc.
   5. Certain important APIs (such as `def append` logic) were hard to migrate 
because logic was mingled with Leader epoch cache, Producer state manager and 
log start offset.
   
   **TESTS:**
   Note that just the test `LogTest. 
testProducerSnapshotsRecoveryAfterUncleanShutdown*` is failing, and I'll need 
to look into this. Otherwise all other tests are expected to pass for this PR. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to