Jay Kreps created KAFKA-506:
-------------------------------

             Summary: Store logical offset in log
                 Key: KAFKA-506
                 URL: https://issues.apache.org/jira/browse/KAFKA-506
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 0.8
            Reporter: Jay Kreps
            Assignee: Jay Kreps
             Fix For: 0.8


Currently we only support retention by dropping entire segment files. A more 
nuanced retention policy would allow dropping individual messages from a 
segment file by recopying it. This is not currently possible because the lookup 
structure we use to locate messages is based on the file offset directly.

To fix this we should move to a sequential, logical offset (0,1,2,3,...) which 
would allow deleting individual messages (e.g. 2) without deleting the entire 
segment.

It is desirable to make this change in the 0.8 timeframe since we are already 
doing data format changes.

As part of this we would explicitly store the key field given by the producer 
for partitioning (right now there is no way for the consumer to find the value 
used for partitioning).

This combination of features would allow a key-based retention policy that 
would clean obsolete values either by a user defined key.

The specific use case I am targeting is a commit log for local state maintained 
by a process doing some kind of near-real-time processing. The process could 
log out its local state changes and be able to restore from this log in the 
event of a failure. However I think this is a broadly useful feature.

The following changes would be part of this:
1. The log format would now be
      8 byte offset
      4 byte message_size
      N byte message
2. The offsets would be changed to a sequential, logical number rather than the 
byte offset (e.g. 0,1,2,3,...)
3. A local memory-mapped lookup structure will be kept for each log segment 
that contains the mapping from logical to physical offset.

I propose to break this into two patches. The first makes the log format 
changes, but retains the physical offset. The second adds the lookup structure 
and moves to logical offset.

Here are a few issues to be considered for the first patch:
1. Currently a MessageSet implements Iterable[MessageAndOffset]. One surprising 
thing is that the offset is actually the offset of the next message. I think 
there are actually several uses for the current offset. I would propose making 
this hold the current message offset since with logical offsets the next offset 
is always just current_offset+1. Note that since we no longer require messages 
to be dense, it is not true that if the next offset is N the current offset is 
N-1 (because N-1 may have been deleted). Thoughts or objections?
2. Currently during iteration over a ByteBufferMessageSet we throw an exception 
if there are zero messages in the set. This is used to detect fetches that are 
smaller than a single message size. I think this behavior is misplaced and 
should be moved up into the consumer.
3. In addition to adding a key in Message, I made two other changes: (1) I 
moved the CRC to the first field and made it cover the entire message contents 
(previously it only covered the payload), (2) I dropped support for Magic=0, 
effectively making the attributes field required, which simplifies the code 
(since we are breaking compatibility anyway).



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to