Hi, Jason,

Thanks for the reply. They sound good to me.

Jun

On Fri, Jan 27, 2017 at 4:42 PM, Jason Gustafson <ja...@confluent.io> wrote:

> A few more responses:
>
>
> > 101. Compatibility during upgrade: Suppose that the brokers are upgraded
> to
> > the new version, but the broker message format is still the old one. If a
> > new producer uses the transaction feature, should the producer get an
> error
> > in this case? A tricky case can be that the leader broker is on the new
> > message format, but the follower broker is still on the old message
> format.
> > In this case, the transactional info will be lost in the follower due to
> > down conversion. Should we failed the transactional requests when the
> > followers are still on the old message format?
>
>
> We've added some more details to the document about migration. Please take
> a look. Two points worth mentioning:
>
> 1. Replicas currently take the message format used by the leader. As long
> as users do the usual procedure of two rolling bounces, it should be safe
> to upgrade the message format.
>
> 2. There is no way to support idempotent or transactional features if we
> downgrade the message format in the produce request handler. We've modified
> the design document to only permit message downgrades if the producer has
> disabled idempotence. Otherwise, we will return an
> UNSUPPORTED_FOR_MESSAGE_FORMAT error.
>
> 110. Transaction log:
> > 110.1 "Key => Version AppID Version" It seems that Version should really
> be
> > Type?
> > 110.2 "Value => Version Epoch Status ExpirationTime [Topic Partition]"
> > Should we store [Topic [Partition]] instead?
> > 110.3 To expire an AppId, do we need to insert a tombstone with the
> expired
> > AppID as the key to physically remove the existing AppID entries in the
> > transaction log?
>
>
> Fixed in the document. For 110.3, yes, we need to insert a tombstone after
> the AppID has expired. This will work in much the same way as the consumer
> coordinator expires offsets using a periodic task.
>
> 116. ProducerRequest: The existing format doesn't have "MessageSetSize" at
> > the partition level.
>
>
> This was intentional, but it is easy to overlook. The idea is to modify the
> ProduceRequest so that only one message set is included for each partition.
> Since the message set contains its own length field, it seemed unnecessary
> to have a separate field. The justification for this change was to make the
> produce request atomic. With only a single message set for each partition,
> either it will be written successfully or not, so an error in the response
> will be unambiguous. We are uncertain whether there are legitimate use
> cases that require producing smaller message sets in the ProduceRequest, so
> we would love to hear feedback on this.
>
> Thanks,
> Jason
>
> On Fri, Jan 27, 2017 at 4:21 PM, Apurva Mehta <apu...@confluent.io> wrote:
>
> > Hi again Jun,
> >
> > I have update the document to address your comments below, but including
> > the responses inline to make it easier for everyone to stay on top of the
> > conversation.
> >
> >
> >
> > > 106. Compacted topics.
> > > 106.1. When all messages in a transaction are removed, we could remove
> > the
> > > commit/abort marker for that transaction too. However, we have to be a
> > bit
> > > careful. If the marker is removed too quickly, it's possible for a
> > consumer
> > > to see a message in that transaction, but not to see the marker, and
> > > therefore will be stuck in that transaction forever. We have a similar
> > > issue when dealing with tombstones. The solution is to preserve the
> > > tombstone for at least a preconfigured amount of time after the
> cleaning
> > > has passed the tombstone. Then, as long as a consumer can finish
> reading
> > to
> > > the cleaning point within the configured amount of time, it's
> guaranteed
> > > not to miss the tombstone after it has seen a non-tombstone message on
> > the
> > > same key. I am wondering if we should do something similar here.
> > >
> >
> > This is a good point. As we discussed offline, the solution for the
> removal
> > of control messages will be the same as the solution for problem of
> > tombstone removal documented in
> > https://issues.apache.org/jira/browse/KAFKA-4545.
> >
> > 106.2. "To address this problem, we propose to preserve the last epoch
> and
> > > sequence number written by each producer for a fixed amount of time as
> an
> > > empty message set. This is allowed by the new message format we are
> > > proposing in this document. The time to preserve the sequence number
> will
> > > be governed by the log retention settings. " Could you be a bit more
> > > specific on what retention time will be used since by default, there is
> > no
> > > retention time for compacted (but not delete) topic?
> > >
> >
> > We discussed this offline, and the consensus that it is reasonable to use
> > brokers global log.retention.* settings for these messages.
> >
> >
> > > 106.3 "As for control messages, if the broker does not have any
> > > corresponding transaction cached with the PID when encountering a
> control
> > > message, that message can be safely removed."
> > > Do controlled messages have keys? If not, do we need to relax the
> >
> > constraint that messages in a compacted topic must have keys?
> > >
> >
> > The key of a control messages is the control message type. As such,
> regular
> > compaction logic based on key will not apply to control messages. We will
> > have to update the log cleaner to ignore messages which have the control
> > message bit set.
> >
> > Control messages can be removed at some point after the last messages of
> > the corresponding transaction are removed. As suggested in KAFKA-4545, we
> > can use the timestamp associated with the log segment to deduce the safe
> > expiration time for control messages in that segment.
> >
> >
> >
> > > 112. Control message: Will control messages be used for timestamp
> > indexing?
> > > If so, what timestamp will we use if the timestamp type is creation
> time?
> > >
> > >
> > Control messages will not be used for timestamp indexing. Each control
> > message will have the log append time for the timestamp, but these
> messages
> > will be ignored when building the timestamp index. Since control messages
> > are for system use only and will never be exposed to users, it doesn't
> make
> > sense to include them in the timestamp index.
> >
> > Further, as you mentioned, when a topic uses creation time, it is
> > impossible to ensure that control messages will not skew the time based
> > index, since these messages are sent by the transaction coordinator which
> > has no notion of the application level message creation time.
> >
> > Thanks,
> > Apurva
> >
>

Reply via email to