[ https://issues.apache.org/jira/browse/KAFKA-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jay Kreps updated KAFKA-506: ---------------------------- Attachment: KAFKA-506-phase-2-v2.patch > Store logical offset in log > --------------------------- > > Key: KAFKA-506 > URL: https://issues.apache.org/jira/browse/KAFKA-506 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8 > Reporter: Jay Kreps > Assignee: Jay Kreps > Fix For: 0.8 > > Attachments: KAFKA-506-phase-2.patch, KAFKA-506-phase-2-v1.patch, > KAFKA-506-phase-2-v2.patch, KAFKA-506-v1-draft.patch, KAFKA-506-v1.patch > > > Currently we only support retention by dropping entire segment files. A more > nuanced retention policy would allow dropping individual messages from a > segment file by recopying it. This is not currently possible because the > lookup structure we use to locate messages is based on the file offset > directly. > To fix this we should move to a sequential, logical offset (0,1,2,3,...) > which would allow deleting individual messages (e.g. 2) without deleting the > entire segment. > It is desirable to make this change in the 0.8 timeframe since we are already > doing data format changes. > As part of this we would explicitly store the key field given by the producer > for partitioning (right now there is no way for the consumer to find the > value used for partitioning). > This combination of features would allow a key-based retention policy that > would clean obsolete values either by a user defined key. > The specific use case I am targeting is a commit log for local state > maintained by a process doing some kind of near-real-time processing. The > process could log out its local state changes and be able to restore from > this log in the event of a failure. However I think this is a broadly useful > feature. > The following changes would be part of this: > 1. The log format would now be > 8 byte offset > 4 byte message_size > N byte message > 2. The offsets would be changed to a sequential, logical number rather than > the byte offset (e.g. 0,1,2,3,...) > 3. A local memory-mapped lookup structure will be kept for each log segment > that contains the mapping from logical to physical offset. > I propose to break this into two patches. The first makes the log format > changes, but retains the physical offset. The second adds the lookup > structure and moves to logical offset. > Here are a few issues to be considered for the first patch: > 1. Currently a MessageSet implements Iterable[MessageAndOffset]. One > surprising thing is that the offset is actually the offset of the next > message. I think there are actually several uses for the current offset. I > would propose making this hold the current message offset since with logical > offsets the next offset is always just current_offset+1. Note that since we > no longer require messages to be dense, it is not true that if the next > offset is N the current offset is N-1 (because N-1 may have been deleted). > Thoughts or objections? > 2. Currently during iteration over a ByteBufferMessageSet we throw an > exception if there are zero messages in the set. This is used to detect > fetches that are smaller than a single message size. I think this behavior is > misplaced and should be moved up into the consumer. > 3. In addition to adding a key in Message, I made two other changes: (1) I > moved the CRC to the first field and made it cover the entire message > contents (previously it only covered the payload), (2) I dropped support for > Magic=0, effectively making the attributes field required, which simplifies > the code (since we are breaking compatibility anyway). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira