Hi again Jun,

I have update the document to address your comments below, but including
the responses inline to make it easier for everyone to stay on top of the
conversation.



> 106. Compacted topics.
> 106.1. When all messages in a transaction are removed, we could remove the
> commit/abort marker for that transaction too. However, we have to be a bit
> careful. If the marker is removed too quickly, it's possible for a consumer
> to see a message in that transaction, but not to see the marker, and
> therefore will be stuck in that transaction forever. We have a similar
> issue when dealing with tombstones. The solution is to preserve the
> tombstone for at least a preconfigured amount of time after the cleaning
> has passed the tombstone. Then, as long as a consumer can finish reading to
> the cleaning point within the configured amount of time, it's guaranteed
> not to miss the tombstone after it has seen a non-tombstone message on the
> same key. I am wondering if we should do something similar here.
>

This is a good point. As we discussed offline, the solution for the removal
of control messages will be the same as the solution for problem of
tombstone removal documented in
https://issues.apache.org/jira/browse/KAFKA-4545.

106.2. "To address this problem, we propose to preserve the last epoch and
> sequence number written by each producer for a fixed amount of time as an
> empty message set. This is allowed by the new message format we are
> proposing in this document. The time to preserve the sequence number will
> be governed by the log retention settings. " Could you be a bit more
> specific on what retention time will be used since by default, there is no
> retention time for compacted (but not delete) topic?
>

We discussed this offline, and the consensus that it is reasonable to use
brokers global log.retention.* settings for these messages.


> 106.3 "As for control messages, if the broker does not have any
> corresponding transaction cached with the PID when encountering a control
> message, that message can be safely removed."
> Do controlled messages have keys? If not, do we need to relax the

constraint that messages in a compacted topic must have keys?
>

The key of a control messages is the control message type. As such, regular
compaction logic based on key will not apply to control messages. We will
have to update the log cleaner to ignore messages which have the control
message bit set.

Control messages can be removed at some point after the last messages of
the corresponding transaction are removed. As suggested in KAFKA-4545, we
can use the timestamp associated with the log segment to deduce the safe
expiration time for control messages in that segment.



> 112. Control message: Will control messages be used for timestamp indexing?
> If so, what timestamp will we use if the timestamp type is creation time?
>
>
Control messages will not be used for timestamp indexing. Each control
message will have the log append time for the timestamp, but these messages
will be ignored when building the timestamp index. Since control messages
are for system use only and will never be exposed to users, it doesn't make
sense to include them in the timestamp index.

Further, as you mentioned, when a topic uses creation time, it is
impossible to ensure that control messages will not skew the time based
index, since these messages are sent by the transaction coordinator which
has no notion of the application level message creation time.

Thanks,
Apurva

Reply via email to