[ 
https://issues.apache.org/jira/browse/KAFKA-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps updated KAFKA-521:
----------------------------

    Attachment: KAFKA-521-v2.patch

Updated patch. In addition to the items in v1 This has the following changes:
1. Rebased again
2. FileMessageSet: Renamed the "limit" variable in FileMessageSet used for 
slicing to "end" since it was very confusing whether this was the absolute 
position of the final byte in the slice or the relative offset from the start 
position given (limit usually means the later).
3. FileMessageSegment, LogSegment, Log: Found a bug in LogSegment.recover(). If 
the message size was corrupted it is possible for the recovery procedure to go 
out of memory since it tries to load a message of the corrupt size. To fix this 
I now pass the max message size that we specify in the config into the recovery 
procedure, and in turn into FileMessageSet.iterator, and treat any message in 
the log larger than this maximum as a corruption.
4. Log: Fix a bug in Log.truncateTo--we need to delete the old segments before 
creating the new segment to ensure we don't delete the new segment.
5. LogSement: Added a new optimization to LogSegment.translateOffset. We 
potentially do two translations per read()--one for the startOffset and one for 
the end offset (if there is one). It is possible that the nearest index entry 
lower bound on the end offset is actually lower than the 
startOffset--potentially much lower. So in this case rather than starting the 
search from this position it is better to start from the translated startOffset 
since it is guaranteed to be <= endOffset. A nice special case of this is that 
if you fetch a single message at a time you never do more than one message read 
in Log.searchFor.
6. I did an assessment of unit test coverage and added test cases where I 
thought there were particularly glaring holes. Added cases covering: index 
rebuilding, log corruption, iterating a FileMessageSet slice, truncating a 
FileMessageSet. I also expanded a few other existing test.
                
> Refactor Log subsystem
> ----------------------
>
>                 Key: KAFKA-521
>                 URL: https://issues.apache.org/jira/browse/KAFKA-521
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Jay Kreps
>         Attachments: KAFKA-521-v1.patch, KAFKA-521-v2.patch
>
>
> There are a number of items it would be nice to cleanup in the log subsystem:
> 1. Misc. funky apis in Log and LogManager
> 2. Much of the functionality in Log should move into LogSegment along with 
> corresponding tests
> 3. We should remove SegmentList and instead use a ConcurrentSkipListMap
> The general idea of the refactoring fall into two categories. First, improve 
> and thoroughly document the public APIs. Second, have a clear delineation of 
> responsibility between the various layers:
> 1. LogManager is responsible for the creation and deletion of logs as well as 
> the retention of data in log segments. LogManager is the only layer aware of 
> partitions and topics. LogManager consists of a bunch of individual Log 
> instances and interacts with them only through their public API (mostly true 
> today).
> 2. Log represents a totally ordered log. Log is responsible for reading, 
> appending, and truncating the log. A log consists of a bunch of LogSegments. 
> Currently much of the functionality in Log should move into LogSegment with 
> Log interacting only through the Log interface. Currently we reach around 
> this a lot to call into FileMessageSet and OffsetIndex.
> 3. A LogSegment consists of an OffsetIndex and a FileMessageSet. It supports 
> largely the same APIs as Log, but now localized to a single segment.
> This cleanup will simplify testing and debugging because it will make the 
> responsibilities and guarantees at each layer more clear.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to