[
https://issues.apache.org/jira/browse/KAFKA-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470497#comment-13470497
]
Jay Kreps commented on KAFKA-506:
---------------------------------
jkreps-mn:kafka-git jkreps$ git pull
remote: Counting objects: 72, done.
remote: Compressing objects: 100% (37/37), done.
remote: Total 42 (delta 26), reused 0 (delta 0)
Unpacking objects: 100% (42/42), done.
>From git://git.apache.org/kafka
0aa1500..65e139c 0.8 -> origin/0.8
Auto-merging core/src/main/scala/kafka/api/FetchResponse.scala
CONFLICT (content): Merge conflict in
core/src/main/scala/kafka/api/FetchResponse.scala
Auto-merging core/src/main/scala/kafka/api/ProducerRequest.scala
CONFLICT (content): Merge conflict in
core/src/main/scala/kafka/api/ProducerRequest.scala
Auto-merging core/src/main/scala/kafka/consumer/ConsumerFetcherThread.scala
Auto-merging core/src/main/scala/kafka/server/AbstractFetcherThread.scala
CONFLICT (content): Merge conflict in
core/src/main/scala/kafka/server/AbstractFetcherThread.scala
Auto-merging core/src/main/scala/kafka/server/KafkaApis.scala
CONFLICT (content): Merge conflict in
core/src/main/scala/kafka/server/KafkaApis.scala
Auto-merging core/src/main/scala/kafka/server/ReplicaFetcherThread.scala
Auto-merging
core/src/test/scala/unit/kafka/api/RequestResponseSerializationTest.scala
Auto-merging core/src/test/scala/unit/kafka/producer/SyncProducerTest.scala
Auto-merging core/src/test/scala/unit/kafka/utils/TestUtils.scala
Automatic merge failed; fix conflicts and then commit the result.
:-(
> Store logical offset in log
> ---------------------------
>
> Key: KAFKA-506
> URL: https://issues.apache.org/jira/browse/KAFKA-506
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.8
> Reporter: Jay Kreps
> Assignee: Jay Kreps
> Fix For: 0.8
>
> Attachments: KAFKA-506-phase-2.patch, KAFKA-506-phase-2-v1.patch,
> KAFKA-506-phase-2-v2.patch, KAFKA-506-phase-2-v3.patch,
> KAFKA-506-phase-2-v4.patch, KAFKA-506-phase-2-v5.patch,
> KAFKA-506-v1-draft.patch, KAFKA-506-v1.patch,
> KAFKA-506-v4-changes-since-v3.patch
>
>
> Currently we only support retention by dropping entire segment files. A more
> nuanced retention policy would allow dropping individual messages from a
> segment file by recopying it. This is not currently possible because the
> lookup structure we use to locate messages is based on the file offset
> directly.
> To fix this we should move to a sequential, logical offset (0,1,2,3,...)
> which would allow deleting individual messages (e.g. 2) without deleting the
> entire segment.
> It is desirable to make this change in the 0.8 timeframe since we are already
> doing data format changes.
> As part of this we would explicitly store the key field given by the producer
> for partitioning (right now there is no way for the consumer to find the
> value used for partitioning).
> This combination of features would allow a key-based retention policy that
> would clean obsolete values either by a user defined key.
> The specific use case I am targeting is a commit log for local state
> maintained by a process doing some kind of near-real-time processing. The
> process could log out its local state changes and be able to restore from
> this log in the event of a failure. However I think this is a broadly useful
> feature.
> The following changes would be part of this:
> 1. The log format would now be
> 8 byte offset
> 4 byte message_size
> N byte message
> 2. The offsets would be changed to a sequential, logical number rather than
> the byte offset (e.g. 0,1,2,3,...)
> 3. A local memory-mapped lookup structure will be kept for each log segment
> that contains the mapping from logical to physical offset.
> I propose to break this into two patches. The first makes the log format
> changes, but retains the physical offset. The second adds the lookup
> structure and moves to logical offset.
> Here are a few issues to be considered for the first patch:
> 1. Currently a MessageSet implements Iterable[MessageAndOffset]. One
> surprising thing is that the offset is actually the offset of the next
> message. I think there are actually several uses for the current offset. I
> would propose making this hold the current message offset since with logical
> offsets the next offset is always just current_offset+1. Note that since we
> no longer require messages to be dense, it is not true that if the next
> offset is N the current offset is N-1 (because N-1 may have been deleted).
> Thoughts or objections?
> 2. Currently during iteration over a ByteBufferMessageSet we throw an
> exception if there are zero messages in the set. This is used to detect
> fetches that are smaller than a single message size. I think this behavior is
> misplaced and should be moved up into the consumer.
> 3. In addition to adding a key in Message, I made two other changes: (1) I
> moved the CRC to the first field and made it cover the entire message
> contents (previously it only covered the payload), (2) I dropped support for
> Magic=0, effectively making the attributes field required, which simplifies
> the code (since we are breaking compatibility anyway).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira