Re: [DISCUSS] KIP-31 - Message format change proposal

Jay Kreps Mon, 21 Sep 2015 18:22:12 -0700

For (3) I don't think we can change the offset in the outer message from
what it is today as it is relied upon in the search done in the log layer.
The reason it is the offset of the last message rather than the first is to
make the offset a least upper bound (i.e. the smallest offset >=
fetch_offset). This needs to work the same for both gaps due to compacted
topics and gaps due to compressed messages.


So imagine you had a compressed set with offsets {45, 46, 47, 48} if you
assigned this compressed set the offset 45 a fetch for 46 would actually
skip ahead to 49 (the least upper bound).

-Jay

On Mon, Sep 21, 2015 at 5:17 PM, Jun Rao <j...@confluent.io> wrote:

> Jiangjie,
>
> Thanks for the writeup. A few comments below.
>
> 1. We will need to be a bit careful with fetch requests from the followers.
> Basically, as we are doing a rolling upgrade of the brokers, the follower
> can't start issuing V2 of the fetch request until the rest of the brokers
> are ready to process it. So, we probably need to make use of
> inter.broker.protocol.version to do the rolling upgrade. In step 1, we set
> inter.broker.protocol.version to 0.9 and do a round of rolling upgrade of
> the brokers. At this point, all brokers are capable of processing V2 of
> fetch requests, but no broker is using it yet. In step 2, we
> set inter.broker.protocol.version to 0.10 and do another round of rolling
> restart of the brokers. In this step, the upgraded brokers will start
> issuing V2 of the fetch request.
>
> 2. If we do #1, I am not sure if there is still a need for
> message.format.version since the broker can start writing messages in the
> new format after inter.broker.protocol.version is set to 0.10.
>
> 3. It wasn't clear from the wiki whether the base offset in the shallow
> message is the offset of the first or the last inner message. It's better
> to use the offset of the last inner message. This way, the followers don't
> have to decompress messages to figure out the next fetch offset.
>
> 4. I am not sure that I understand the following sentence in the wiki. It
> seems that the relative offsets in a compressed message don't have to be
> consecutive. If so, why do we need to update the relative offsets in the
> inner messages?
> "When the log cleaner compacts log segments, it needs to update the inner
> message's relative offset values."
>
> Thanks,
>
> Jun
>
> On Thu, Sep 17, 2015 at 12:54 PM, Jiangjie Qin <j...@linkedin.com.invalid>
> wrote:
>
> > Hi folks,
> >
> > Thanks a lot for the feedback on KIP-31 - move to use relative offset.
> (Not
> > including timestamp and index discussion).
> >
> > I updated the migration plan section as we discussed on KIP hangout. I
> > think it is the only concern raised so far. Please let me know if there
> are
> > further comments about the KIP.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Mon, Sep 14, 2015 at 5:13 PM, Jiangjie Qin <j...@linkedin.com> wrote:
> >
> > > I just updated the KIP-33 to explain the indexing on CreateTime and
> > > LogAppendTime respectively. I also used some use case to compare the
> two
> > > solutions.
> > > Although this is for KIP-33, but it does give a some insights on
> whether
> > > it makes sense to have a per message LogAppendTime.
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index
> > >
> > > As a short summary of the conclusions we have already reached on
> > timestamp:
> > > 1. It is good to add a timestamp to the message.
> > > 2. LogAppendTime should be used for broker policy enforcement (Log
> > > retention / rolling)
> > > 3. It is useful to have a CreateTime in message format, which is
> > immutable
> > > after producer sends the message.
> > >
> > > There are following questions still in discussion:
> > > 1. Should we also add LogAppendTime to message format?
> > > 2. which timestamp should we use to build the index.
> > >
> > > Let's talk about question 1 first because question 2 is actually a
> follow
> > > up question for question 1.
> > > Here are what I think:
> > > 1a. To enforce broker log policy, theoretically we don't need
> per-message
> > > LogAppendTime. If we don't include LogAppendTime in message, we still
> > need
> > > to implement a separate solution to pass log segment timestamps among
> > > brokers. That means if we don't include the LogAppendTime in message,
> > there
> > > will be further complication in replication.
> > > 1b. LogAppendTime has some advantage over CreateTime (KIP-33 has detail
> > > comparison)
> > > 1c. We have already exposed offset, which is essentially an internal
> > > concept of message in terms of position. Exposing LogAppendTime means
> we
> > > expose another internal concept of message in terms of time.
> > >
> > > Considering the above reasons, personally I think it worth adding the
> > > LogAppendTime to each message.
> > >
> > > Any thoughts?
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Mon, Sep 14, 2015 at 11:44 AM, Jiangjie Qin <j...@linkedin.com>
> > wrote:
> > >
> > >> I was trying to send last email before KIP hangout so maybe did not
> > think
> > >> it through completely. By the way, the discussion is actually more
> > related
> > >> to KIP-33, i.e. whether we should index on CreateTime or
> LogAppendTime.
> > >> (Although it seems all the discussion are still in this mailing
> > thread...)
> > >> This solution in last email is for indexing on CreateTime. It is
> > >> essentially what Jay suggested except we use a timestamp map instead
> of
> > a
> > >> memory mapped index file. Please ignore the proposal of using a log
> > >> compacted topic. The solution can be simplified to:
> > >>
> > >> Each broker keeps
> > >> 1. a timestamp index map - Map[TopicPartitionSegment, Map[Timestamp,
> > >> Offset]]. The timestamp is on minute boundary.
> > >> 2. A timestamp index file for each segment.
> > >> When a broker receives a message (both leader or follower), it checks
> if
> > >> the timestamp index map contains the timestamp for current segment.
> The
> > >> broker add the offset to the map and append an entry to the timestamp
> > index
> > >> if the timestamp does not exist. i.e. we only use the index file as a
> > >> persistent copy of the index timestamp map.
> > >>
> > >> When a log segment is deleted, we need to:
> > >> 1. delete the TopicPartitionKeySegment key in the timestamp index map.
> > >> 2. delete the timestamp index file
> > >>
> > >> This solution assumes we only keep CreateTime in the message. There
> are
> > a
> > >> few trade-offs in this solution:
> > >> 1. The granularity of search will be per minute.
> > >> 2. All the timestamp index map has to be in the memory all the time.
> > >> 3. We need to think about another way to honor log retention time and
> > >> time-based log rolling.
> > >> 4. We lose the benefit brought by including LogAppendTime in the
> message
> > >> mentioned earlier.
> > >>
> > >> I am not sure whether this solution is necessarily better than
> indexing
> > >> on LogAppendTime.
> > >>
> > >> I will update KIP-33 to explain the solution to index on CreateTime
> and
> > >> LogAppendTime respectively and put some more concrete use cases as
> well.
> > >>
> > >> Thanks,
> > >>
> > >> Jiangjie (Becket) Qin
> > >>
> > >>
> > >> On Mon, Sep 14, 2015 at 9:40 AM, Jiangjie Qin <j...@linkedin.com>
> > wrote:
> > >>
> > >>> Hi Joel,
> > >>>
> > >>> Good point about rebuilding index. I agree that having a per message
> > >>> LogAppendTime might be necessary. About time adjustment, the solution
> > >>> sounds promising, but it might be better to make it as a follow up of
> > the
> > >>> KIP because it seems a really rare use case.
> > >>>
> > >>> I have another thought on how to manage the out of order timestamps.
> > >>> Maybe we can do the following:
> > >>> Create a special log compacted topic __timestamp_index similar to
> > topic,
> > >>> the key would be (TopicPartition, TimeStamp_Rounded_To_Minute), the
> > value
> > >>> is offset. In memory, we keep a map for each TopicPartition, the
> value
> > is
> > >>> (timestamp_rounded_to_minute -> smallest_offset_in_the_minute). This
> > way we
> > >>> can search out of order message and make sure no message is missing.
> > >>>
> > >>> Thoughts?
> > >>>
> > >>> Thanks,
> > >>>
> > >>> Jiangjie (Becket) Qin
> > >>>
> > >>> On Fri, Sep 11, 2015 at 12:46 PM, Joel Koshy <jjkosh...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Jay had mentioned the scenario of mirror-maker bootstrap which would
> > >>>> effectively reset the logAppendTimestamps for the bootstrapped data.
> > >>>> If we don't include logAppendTimestamps in each message there is a
> > >>>> similar scenario when rebuilding indexes during recovery. So it
> seems
> > >>>> it may be worth adding that timestamp to messages. The drawback to
> > >>>> that is exposing a server-side concept in the protocol (although we
> > >>>> already do that with offsets). logAppendTimestamp really should be
> > >>>> decided by the broker so I think the first scenario may have to be
> > >>>> written off as a gotcha, but the second may be worth addressing (by
> > >>>> adding it to the message format).
> > >>>>
> > >>>> The other point that Jay raised which needs to be addressed (since
> we
> > >>>> require monotically increasing timestamps in the index) in the
> > >>>> proposal is changing time on the server (I'm a little less concerned
> > >>>> about NTP clock skews than a user explicitly changing the server's
> > >>>> time - i.e., big clock skews). We would at least want to "set back"
> > >>>> all the existing timestamps to guarantee non-decreasing timestamps
> > >>>> with future messages. I'm not sure at this point how best to handle
> > >>>> that, but we could perhaps have a epoch/base-time (or
> time-correction)
> > >>>> stored in the log directories and base all log index timestamps off
> > >>>> that base-time (or corrected). So if at any time you determine that
> > >>>> time has changed backwards you can adjust that base-time without
> > >>>> having to fix up all the entries. Without knowing the exact diff
> > >>>> between the previous clock and new clock we cannot adjust the times
> > >>>> exactly, but we can at least ensure increasing timestamps.
> > >>>>
> > >>>> On Fri, Sep 11, 2015 at 10:52 AM, Jiangjie Qin
> > >>>> <j...@linkedin.com.invalid> wrote:
> > >>>> > Ewen and Jay,
> > >>>> >
> > >>>> > They way I see the LogAppendTime is another format of "offset". It
> > >>>> serves
> > >>>> > the following purpose:
> > >>>> > 1. Locate messages not only by position, but also by time. The
> > >>>> difference
> > >>>> > from offset is timestamp is not unique for all messags.
> > >>>> > 2. Allow broker to manage messages based on time, e.g. retention,
> > >>>> rolling
> > >>>> > 3. Provide convenience for user to search message not only by
> > offset,
> > >>>> but
> > >>>> > also by timestamp.
> > >>>> >
> > >>>> > For purpose (2) we don't need per message server timestamp. We
> only
> > >>>> need
> > >>>> > per log segment server timestamp and propagate it among brokers.
> > >>>> >
> > >>>> > For (1) and (3), we need per message timestamp. Then the question
> is
> > >>>> > whether we should use CreateTime or LogAppendTime?
> > >>>> >
> > >>>> > I completely agree that an application timestamp is very useful
> for
> > >>>> many
> > >>>> > use cases. But it seems to me that having Kafka to understand and
> > >>>> maintain
> > >>>> > application timestamp is a bit over demanding. So I think there is
> > >>>> value to
> > >>>> > pass on CreateTime for application convenience, but I am not sure
> it
> > >>>> can
> > >>>> > replace LogAppendTime. Managing out-of-order CreateTime is
> > equivalent
> > >>>> to
> > >>>> > allowing producer to send their own offset and ask broker to
> manage
> > >>>> the
> > >>>> > offset for them, It is going to be very hard to maintain and could
> > >>>> create
> > >>>> > huge performance/functional issue because of complicated logic.
> > >>>> >
> > >>>> > About whether we should expose LogAppendTime to broker, I agree
> that
> > >>>> server
> > >>>> > timestamp is internal to broker, but isn't offset also an internal
> > >>>> concept?
> > >>>> > Arguably it's not provided by producer so consumer application
> logic
> > >>>> does
> > >>>> > not have to know offset. But user needs to know offset because
> they
> > >>>> need to
> > >>>> > know "where is the message" in the log. LogAppendTime provides the
> > >>>> answer
> > >>>> > of "When was the message appended" to the log. So personally I
> think
> > >>>> it is
> > >>>> > reasonable to expose the LogAppendTime to consumers.
> > >>>> >
> > >>>> > I can see some use cases of exposing the LogAppendTime, to name
> > some:
> > >>>> > 1. Let's say broker has 7 days of log retention, some application
> > >>>> wants to
> > >>>> > reprocess the data in past 3 days. User can simply provide the
> > >>>> timestamp
> > >>>> > and start consume.
> > >>>> > 2. User can easily know lag by time.
> > >>>> > 3. Cross cluster fail over. This is a more complicated use case,
> > >>>> there are
> > >>>> > two goals: 1) Not lose message; and 2) do not reconsume tons of
> > >>>> messages.
> > >>>> > Only knowing offset of cluster A won't help with finding fail over
> > >>>> point in
> > >>>> > cluster B  because an offset of a cluster means nothing to another
> > >>>> cluster.
> > >>>> > Timestamp however is a good cross cluster reference in this case.
> > >>>> >
> > >>>> > Thanks,
> > >>>> >
> > >>>> > Jiangjie (Becket) Qin
> > >>>> >
> > >>>> > On Thu, Sep 10, 2015 at 9:28 PM, Ewen Cheslack-Postava <
> > >>>> e...@confluent.io>
> > >>>> > wrote:
> > >>>> >
> > >>>> >> Re: MM preserving timestamps: Yes, this was how I interpreted the
> > >>>> point in
> > >>>> >> the KIP and I only raised the issue because it restricts the
> > >>>> usefulness of
> > >>>> >> timestamps anytime MM is involved. I agree it's not a deal
> breaker,
> > >>>> but I
> > >>>> >> wanted to understand exact impact of the change. Some users seem
> to
> > >>>> want to
> > >>>> >> be able to seek by application-defined timestamps (despite the
> many
> > >>>> obvious
> > >>>> >> issues involved), and the proposal clearly would not support that
> > >>>> unless
> > >>>> >> the timestamps submitted with the produce requests were
> respected.
> > >>>> If we
> > >>>> >> ignore client submitted timestamps, then we probably want to try
> to
> > >>>> hide
> > >>>> >> the timestamps as much as possible in any public interface (e.g.
> > >>>> never
> > >>>> >> shows up in any public consumer APIs), but expose it just enough
> to
> > >>>> be
> > >>>> >> useful for operational purposes.
> > >>>> >>
> > >>>> >> Sorry if my devil's advocate position / attempt to map the design
> > >>>> space led
> > >>>> >> to some confusion!
> > >>>> >>
> > >>>> >> -Ewen
> > >>>> >>
> > >>>> >>
> > >>>> >> On Thu, Sep 10, 2015 at 5:48 PM, Jay Kreps <j...@confluent.io>
> > wrote:
> > >>>> >>
> > >>>> >> > Ah, I see, I think I misunderstood about MM, it was called out
> in
> > >>>> the
> > >>>> >> > proposal and I thought you were saying you'd retain the
> timestamp
> > >>>> but I
> > >>>> >> > think you're calling out that you're not. In that case you do
> > have
> > >>>> the
> > >>>> >> > opposite problem, right? When you add mirroring for a topic all
> > >>>> that data
> > >>>> >> > will have a timestamp of now and retention won't be right. Not
> a
> > >>>> blocker
> > >>>> >> > but a bit of a gotcha.
> > >>>> >> >
> > >>>> >> > -Jay
> > >>>> >> >
> > >>>> >> >
> > >>>> >> >
> > >>>> >> > On Thu, Sep 10, 2015 at 5:40 PM, Joel Koshy <
> jjkosh...@gmail.com
> > >
> > >>>> wrote:
> > >>>> >> >
> > >>>> >> > > > Don't you see all the same issues you see with
> client-defined
> > >>>> >> > timestamp's
> > >>>> >> > > > if you let mm control the timestamp as you were proposing?
> > >>>> That means
> > >>>> >> > > time
> > >>>> >> > >
> > >>>> >> > > Actually I don't think that was in the proposal (or was it?).
> > >>>> i.e., I
> > >>>> >> > > think it was always supposed to be controlled by the broker
> > (and
> > >>>> not
> > >>>> >> > > MM).
> > >>>> >> > >
> > >>>> >> > > > Also, Joel, can you just confirm that you guys have talked
> > >>>> through
> > >>>> >> the
> > >>>> >> > > > whole timestamp thing with the Samza folks at LI? The
> reason
> > I
> > >>>> ask
> > >>>> >> > about
> > >>>> >> > > > this is that Samza and Kafka Streams (KIP-28) are both
> trying
> > >>>> to rely
> > >>>> >> > on
> > >>>> >> > >
> > >>>> >> > > We have not. This is a good point - we will follow-up.
> > >>>> >> > >
> > >>>> >> > > > WRT your idea of a FollowerFetchRequestI had thought of a
> > >>>> similar
> > >>>> >> idea
> > >>>> >> > > > where we use the leader's timestamps to approximately set
> the
> > >>>> >> > follower's
> > >>>> >> > > > timestamps. I had thought of just adding a partition
> metadata
> > >>>> request
> > >>>> >> > > that
> > >>>> >> > > > would subsume the current offset/time lookup and could be
> > used
> > >>>> by the
> > >>>> >> > > > follower to try to approximately keep their timestamps
> > kosher.
> > >>>> It's a
> > >>>> >> > > > little hacky and doesn't help with MM but it is also maybe
> > less
> > >>>> >> > invasive
> > >>>> >> > > so
> > >>>> >> > > > that approach could be viable.
> > >>>> >> > >
> > >>>> >> > > That would also work, but perhaps responding with the actual
> > >>>> leader
> > >>>> >> > > offset-timestamp entries (corresponding to the fetched
> portion)
> > >>>> would
> > >>>> >> > > be exact and it should be small as well. Anyway, the main
> > >>>> motivation
> > >>>> >> > > in this was to avoid leaking server-side timestamps to the
> > >>>> >> > > message-format if people think it is worth it so the
> > >>>> alternatives are
> > >>>> >> > > implementation details. My original instinct was that it also
> > >>>> avoids a
> > >>>> >> > > backwards incompatible change (but it does not because we
> also
> > >>>> have
> > >>>> >> > > the relative offset change).
> > >>>> >> > >
> > >>>> >> > > Thanks,
> > >>>> >> > >
> > >>>> >> > > Joel
> > >>>> >> > >
> > >>>> >> > > >
> > >>>> >> > > >
> > >>>> >> > > >
> > >>>> >> > > > On Thu, Sep 10, 2015 at 3:36 PM, Joel Koshy <
> > >>>> jjkosh...@gmail.com>
> > >>>> >> > wrote:
> > >>>> >> > > >
> > >>>> >> > > >> I just wanted to comment on a few points made earlier in
> > this
> > >>>> >> thread:
> > >>>> >> > > >>
> > >>>> >> > > >> Concerns on clock skew: at least for the original
> proposal's
> > >>>> scope
> > >>>> >> > > >> (which was more for honoring retention broker-side) this
> > >>>> would only
> > >>>> >> be
> > >>>> >> > > >> an issue when spanning leader movements right? i.e.,
> leader
> > >>>> >> migration
> > >>>> >> > > >> latency has to be much less than clock skew for this to
> be a
> > >>>> real
> > >>>> >> > > >> issue wouldn’t it?
> > >>>> >> > > >>
> > >>>> >> > > >> Client timestamp vs broker timestamp: I’m not sure Kafka
> > >>>> (brokers)
> > >>>> >> are
> > >>>> >> > > >> the right place to reason about client-side timestamps
> > >>>> precisely due
> > >>>> >> > > >> to the nuances that have been discussed at length in this
> > >>>> thread. My
> > >>>> >> > > >> preference would have been to the timestamp (now called
> > >>>> >> > > >> LogAppendTimestamp) have nothing to do with the
> > applications.
> > >>>> Ewen
> > >>>> >> > > >> raised a valid concern about leaking such
> > >>>> “private/server-side”
> > >>>> >> > > >> timestamps into the protocol spec. i.e., it is fine to
> have
> > >>>> the
> > >>>> >> > > >> CreateTime which is expressly client-provided and
> immutable
> > >>>> >> > > >> thereafter, but the LogAppendTime is also going part of
> the
> > >>>> protocol
> > >>>> >> > > >> and it would be good to avoid exposure (to client
> > developers)
> > >>>> if
> > >>>> >> > > >> possible. Ok, so here is a slightly different approach
> that
> > I
> > >>>> was
> > >>>> >> just
> > >>>> >> > > >> thinking about (and did not think too far so it may not
> > >>>> work): do
> > >>>> >> not
> > >>>> >> > > >> add the LogAppendTime to messages. Instead, build the
> > >>>> time-based
> > >>>> >> index
> > >>>> >> > > >> on the server side on message arrival time alone.
> Introduce
> > a
> > >>>> new
> > >>>> >> > > >> ReplicaFetchRequest/Response pair. ReplicaFetchResponses
> > will
> > >>>> also
> > >>>> >> > > >> include the slice of the time-based index for the follower
> > >>>> broker.
> > >>>> >> > > >> This way we can at least keep timestamps aligned across
> > >>>> brokers for
> > >>>> >> > > >> retention purposes. We do lose the append timestamp for
> > >>>> mirroring
> > >>>> >> > > >> pipelines (which appears to be the case in KIP-32 as
> well).
> > >>>> >> > > >>
> > >>>> >> > > >> Configurable index granularity: We can do this but I’m not
> > >>>> sure it
> > >>>> >> is
> > >>>> >> > > >> very useful and as Jay noted, a major change from the old
> > >>>> proposal
> > >>>> >> > > >> linked from the KIP is the sparse time-based index which
> we
> > >>>> felt was
> > >>>> >> > > >> essential to bound memory usage (and having timestamps on
> > >>>> each log
> > >>>> >> > > >> index entry was probably a big waste since in the common
> > case
> > >>>> >> several
> > >>>> >> > > >> messages span the same timestamp). BTW another benefit of
> > the
> > >>>> second
> > >>>> >> > > >> index is that it makes it easier to roll-back or throw
> away
> > if
> > >>>> >> > > >> necessary (vs. modifying the existing index format) -
> > >>>> although that
> > >>>> >> > > >> obviously does not help with rolling back the timestamp
> > >>>> change in
> > >>>> >> the
> > >>>> >> > > >> message format, but it is one less thing to worry about.
> > >>>> >> > > >>
> > >>>> >> > > >> Versioning: I’m not sure everyone is saying the same thing
> > >>>> wrt the
> > >>>> >> > > >> scope of this. There is the record format change, but I
> also
> > >>>> think
> > >>>> >> > > >> this ties into all of the API versioning that we already
> > have
> > >>>> in
> > >>>> >> > > >> Kafka. The current API versioning approach works fine for
> > >>>> >> > > >> upgrades/downgrades across official Kafka releases, but
> not
> > >>>> so well
> > >>>> >> > > >> between releases. (We almost got bitten by this at
> LinkedIn
> > >>>> with the
> > >>>> >> > > >> recent changes to various requests but were able to work
> > >>>> around
> > >>>> >> > > >> these.) We can clarify this in the follow-up KIP.
> > >>>> >> > > >>
> > >>>> >> > > >> Thanks,
> > >>>> >> > > >>
> > >>>> >> > > >> Joel
> > >>>> >> > > >>
> > >>>> >> > > >>
> > >>>> >> > > >> On Thu, Sep 10, 2015 at 3:00 PM, Jiangjie Qin
> > >>>> >> > <j...@linkedin.com.invalid
> > >>>> >> > > >
> > >>>> >> > > >> wrote:
> > >>>> >> > > >> > Hi Jay,
> > >>>> >> > > >> >
> > >>>> >> > > >> > I just changed the KIP title and updated the KIP page.
> > >>>> >> > > >> >
> > >>>> >> > > >> > And yes, we are working on a general version control
> > >>>> proposal to
> > >>>> >> > make
> > >>>> >> > > the
> > >>>> >> > > >> > protocol migration like this more smooth. I will also
> > >>>> create a KIP
> > >>>> >> > for
> > >>>> >> > > >> that
> > >>>> >> > > >> > soon.
> > >>>> >> > > >> >
> > >>>> >> > > >> > Thanks,
> > >>>> >> > > >> >
> > >>>> >> > > >> > Jiangjie (Becket) Qin
> > >>>> >> > > >> >
> > >>>> >> > > >> >
> > >>>> >> > > >> > On Thu, Sep 10, 2015 at 2:21 PM, Jay Kreps <
> > >>>> j...@confluent.io>
> > >>>> >> > wrote:
> > >>>> >> > > >> >
> > >>>> >> > > >> >> Great, can we change the name to something related to
> the
> > >>>> >> > > >> change--"KIP-31:
> > >>>> >> > > >> >> Move to relative offsets in compressed message sets".
> > >>>> >> > > >> >>
> > >>>> >> > > >> >> Also you had mentioned before you were going to expand
> on
> > >>>> the
> > >>>> >> > > mechanics
> > >>>> >> > > >> of
> > >>>> >> > > >> >> handling these log format changes, right?
> > >>>> >> > > >> >>
> > >>>> >> > > >> >> -Jay
> > >>>> >> > > >> >>
> > >>>> >> > > >> >> On Thu, Sep 10, 2015 at 12:42 PM, Jiangjie Qin
> > >>>> >> > > >> <j...@linkedin.com.invalid>
> > >>>> >> > > >> >> wrote:
> > >>>> >> > > >> >>
> > >>>> >> > > >> >> > Neha and Jay,
> > >>>> >> > > >> >> >
> > >>>> >> > > >> >> > Thanks a lot for the feedback. Good point about
> > >>>> splitting the
> > >>>> >> > > >> >> discussion. I
> > >>>> >> > > >> >> > have split the proposal to three KIPs and it does
> make
> > >>>> each
> > >>>> >> > > discussion
> > >>>> >> > > >> >> more
> > >>>> >> > > >> >> > clear:
> > >>>> >> > > >> >> > KIP-31 - Message format change (Use relative offset)
> > >>>> >> > > >> >> > KIP-32 - Add CreateTime and LogAppendTime to Kafka
> > >>>> message
> > >>>> >> > > >> >> > KIP-33 - Build a time-based log index
> > >>>> >> > > >> >> >
> > >>>> >> > > >> >> > KIP-33 can be a follow up KIP for KIP-32, so we can
> > >>>> discuss
> > >>>> >> about
> > >>>> >> > > >> KIP-31
> > >>>> >> > > >> >> > and KIP-32 first for now. I will create a separate
> > >>>> discussion
> > >>>> >> > > thread
> > >>>> >> > > >> for
> > >>>> >> > > >> >> > KIP-32 and reply the concerns you raised regarding
> the
> > >>>> >> timestamp.
> > >>>> >> > > >> >> >
> > >>>> >> > > >> >> > So far it looks there is no objection to KIP-31.
> Since
> > I
> > >>>> >> removed
> > >>>> >> > a
> > >>>> >> > > few
> > >>>> >> > > >> >> part
> > >>>> >> > > >> >> > from previous KIP and only left the relative offset
> > >>>> proposal,
> > >>>> >> it
> > >>>> >> > > >> would be
> > >>>> >> > > >> >> > great if people can take another look to see if there
> > is
> > >>>> any
> > >>>> >> > > concerns.
> > >>>> >> > > >> >> >
> > >>>> >> > > >> >> > Thanks,
> > >>>> >> > > >> >> >
> > >>>> >> > > >> >> > Jiangjie (Becket) Qin
> > >>>> >> > > >> >> >
> > >>>> >> > > >> >> >
> > >>>> >> > > >> >> > On Tue, Sep 8, 2015 at 1:28 PM, Neha Narkhede <
> > >>>> >> n...@confluent.io
> > >>>> >> > >
> > >>>> >> > > >> wrote:
> > >>>> >> > > >> >> >
> > >>>> >> > > >> >> > > Becket,
> > >>>> >> > > >> >> > >
> > >>>> >> > > >> >> > > Nice write-up. Few thoughts -
> > >>>> >> > > >> >> > >
> > >>>> >> > > >> >> > > I'd split up the discussion for simplicity. Note
> that
> > >>>> you can
> > >>>> >> > > always
> > >>>> >> > > >> >> > group
> > >>>> >> > > >> >> > > several of these in one patch to reduce the
> protocol
> > >>>> changes
> > >>>> >> > > people
> > >>>> >> > > >> >> have
> > >>>> >> > > >> >> > to
> > >>>> >> > > >> >> > > deal with.This is just a suggestion, but I think
> the
> > >>>> >> following
> > >>>> >> > > split
> > >>>> >> > > >> >> > might
> > >>>> >> > > >> >> > > make it easier to tackle the changes being
> proposed -
> > >>>> >> > > >> >> > >
> > >>>> >> > > >> >> > >    - Relative offsets
> > >>>> >> > > >> >> > >    - Introducing the concept of time
> > >>>> >> > > >> >> > >    - Time-based indexing (separate the usage of the
> > >>>> timestamp
> > >>>> >> > > field
> > >>>> >> > > >> >> from
> > >>>> >> > > >> >> > >    how/whether we want to include a timestamp in
> the
> > >>>> message)
> > >>>> >> > > >> >> > >
> > >>>> >> > > >> >> > > I'm a +1 on relative offsets, we should've done it
> > >>>> back when
> > >>>> >> we
> > >>>> >> > > >> >> > introduced
> > >>>> >> > > >> >> > > it. Other than reducing the CPU overhead, this will
> > >>>> also
> > >>>> >> reduce
> > >>>> >> > > the
> > >>>> >> > > >> >> > garbage
> > >>>> >> > > >> >> > > collection overhead on the brokers.
> > >>>> >> > > >> >> > >
> > >>>> >> > > >> >> > > On the timestamp field, I generally agree that we
> > >>>> should add
> > >>>> >> a
> > >>>> >> > > >> >> timestamp
> > >>>> >> > > >> >> > to
> > >>>> >> > > >> >> > > a Kafka message but I'm not quite sold on how this
> > KIP
> > >>>> >> suggests
> > >>>> >> > > the
> > >>>> >> > > >> >> > > timestamp be set. Will avoid repeating the
> downsides
> > >>>> of a
> > >>>> >> > broker
> > >>>> >> > > >> side
> > >>>> >> > > >> >> > > timestamp mentioned previously in this thread. I
> > think
> > >>>> the
> > >>>> >> > topic
> > >>>> >> > > of
> > >>>> >> > > >> >> > > including a timestamp in a Kafka message requires a
> > >>>> lot more
> > >>>> >> > > thought
> > >>>> >> > > >> >> and
> > >>>> >> > > >> >> > > details than what's in this KIP. I'd suggest we
> make
> > >>>> it a
> > >>>> >> > > separate
> > >>>> >> > > >> KIP
> > >>>> >> > > >> >> > that
> > >>>> >> > > >> >> > > includes a list of all the different use cases for
> > the
> > >>>> >> > timestamp
> > >>>> >> > > >> >> (beyond
> > >>>> >> > > >> >> > > log retention) including stream processing and
> > discuss
> > >>>> >> > tradeoffs
> > >>>> >> > > of
> > >>>> >> > > >> >> > > including client and broker side timestamps.
> > >>>> >> > > >> >> > >
> > >>>> >> > > >> >> > > Agree with the benefit of time-based indexing, but
> > >>>> haven't
> > >>>> >> had
> > >>>> >> > a
> > >>>> >> > > >> chance
> > >>>> >> > > >> >> > to
> > >>>> >> > > >> >> > > dive into the design details yet.
> > >>>> >> > > >> >> > >
> > >>>> >> > > >> >> > > Thanks,
> > >>>> >> > > >> >> > > Neha
> > >>>> >> > > >> >> > >
> > >>>> >> > > >> >> > > On Tue, Sep 8, 2015 at 10:57 AM, Jay Kreps <
> > >>>> j...@confluent.io
> > >>>> >> >
> > >>>> >> > > >> wrote:
> > >>>> >> > > >> >> > >
> > >>>> >> > > >> >> > > > Hey Beckett,
> > >>>> >> > > >> >> > > >
> > >>>> >> > > >> >> > > > I was proposing splitting up the KIP just for
> > >>>> simplicity of
> > >>>> >> > > >> >> discussion.
> > >>>> >> > > >> >> > > You
> > >>>> >> > > >> >> > > > can still implement them in one patch. I think
> > >>>> otherwise it
> > >>>> >> > > will
> > >>>> >> > > >> be
> > >>>> >> > > >> >> > hard
> > >>>> >> > > >> >> > > to
> > >>>> >> > > >> >> > > > discuss/vote on them since if you like the offset
> > >>>> proposal
> > >>>> >> > but
> > >>>> >> > > not
> > >>>> >> > > >> >> the
> > >>>> >> > > >> >> > > time
> > >>>> >> > > >> >> > > > proposal what do you do?
> > >>>> >> > > >> >> > > >
> > >>>> >> > > >> >> > > > Introducing a second notion of time into Kafka
> is a
> > >>>> pretty
> > >>>> >> > > massive
> > >>>> >> > > >> >> > > > philosophical change so it kind of warrants it's
> > own
> > >>>> KIP I
> > >>>> >> > > think
> > >>>> >> > > >> it
> > >>>> >> > > >> >> > isn't
> > >>>> >> > > >> >> > > > just "Change message format".
> > >>>> >> > > >> >> > > >
> > >>>> >> > > >> >> > > > WRT time I think one thing to clarify in the
> > >>>> proposal is
> > >>>> >> how
> > >>>> >> > MM
> > >>>> >> > > >> will
> > >>>> >> > > >> >> > have
> > >>>> >> > > >> >> > > > access to set the timestamp? Presumably this will
> > be
> > >>>> a new
> > >>>> >> > > field
> > >>>> >> > > >> in
> > >>>> >> > > >> >> > > > ProducerRecord, right? If so then any user can
> set
> > >>>> the
> > >>>> >> > > timestamp,
> > >>>> >> > > >> >> > right?
> > >>>> >> > > >> >> > > > I'm not sure you answered the questions around
> how
> > >>>> this
> > >>>> >> will
> > >>>> >> > > work
> > >>>> >> > > >> for
> > >>>> >> > > >> >> > MM
> > >>>> >> > > >> >> > > > since when MM retains timestamps from multiple
> > >>>> partitions
> > >>>> >> > they
> > >>>> >> > > >> will
> > >>>> >> > > >> >> > then
> > >>>> >> > > >> >> > > be
> > >>>> >> > > >> >> > > > out of order and in the past (so the
> > >>>> >> > max(lastAppendedTimestamp,
> > >>>> >> > > >> >> > > > currentTimeMillis) override you proposed will not
> > >>>> work,
> > >>>> >> > > right?).
> > >>>> >> > > >> If
> > >>>> >> > > >> >> we
> > >>>> >> > > >> >> > > > don't do this then when you set up mirroring the
> > >>>> data will
> > >>>> >> > all
> > >>>> >> > > be
> > >>>> >> > > >> new
> > >>>> >> > > >> >> > and
> > >>>> >> > > >> >> > > > you have the same retention problem you
> described.
> > >>>> Maybe I
> > >>>> >> > > missed
> > >>>> >> > > >> >> > > > something...?
> > >>>> >> > > >> >> > > >
> > >>>> >> > > >> >> > > > My main motivation is that given that both Samza
> > and
> > >>>> Kafka
> > >>>> >> > > streams
> > >>>> >> > > >> >> are
> > >>>> >> > > >> >> > > > doing work that implies a mandatory
> client-defined
> > >>>> notion
> > >>>> >> of
> > >>>> >> > > >> time, I
> > >>>> >> > > >> >> > > really
> > >>>> >> > > >> >> > > > think introducing a different mandatory notion of
> > >>>> time in
> > >>>> >> > > Kafka is
> > >>>> >> > > >> >> > going
> > >>>> >> > > >> >> > > to
> > >>>> >> > > >> >> > > > be quite odd. We should think hard about how
> > >>>> client-defined
> > >>>> >> > > time
> > >>>> >> > > >> >> could
> > >>>> >> > > >> >> > > > work. I'm not sure if it can, but I'm also not
> sure
> > >>>> that it
> > >>>> >> > > can't.
> > >>>> >> > > >> >> > Having
> > >>>> >> > > >> >> > > > both will be odd. Did you chat about this with
> > >>>> Yi/Kartik on
> > >>>> >> > the
> > >>>> >> > > >> Samza
> > >>>> >> > > >> >> > > side?
> > >>>> >> > > >> >> > > >
> > >>>> >> > > >> >> > > > When you are saying it won't work you are
> assuming
> > >>>> some
> > >>>> >> > > particular
> > >>>> >> > > >> >> > > > implementation? Maybe that the index is a
> > >>>> monotonically
> > >>>> >> > > increasing
> > >>>> >> > > >> >> set
> > >>>> >> > > >> >> > of
> > >>>> >> > > >> >> > > > pointers to the least record with a timestamp
> > larger
> > >>>> than
> > >>>> >> the
> > >>>> >> > > >> index
> > >>>> >> > > >> >> > time?
> > >>>> >> > > >> >> > > > In other words a search for time X gives the
> > largest
> > >>>> offset
> > >>>> >> > at
> > >>>> >> > > >> which
> > >>>> >> > > >> >> > all
> > >>>> >> > > >> >> > > > records are <= X?
> > >>>> >> > > >> >> > > >
> > >>>> >> > > >> >> > > > For retention, I agree with the problem you point
> > >>>> out, but
> > >>>> >> I
> > >>>> >> > > think
> > >>>> >> > > >> >> what
> > >>>> >> > > >> >> > > you
> > >>>> >> > > >> >> > > > are saying in that case is that you want a size
> > >>>> limit too.
> > >>>> >> If
> > >>>> >> > > you
> > >>>> >> > > >> use
> > >>>> >> > > >> >> > > > system time you actually hit the same problem:
> say
> > >>>> you do a
> > >>>> >> > > full
> > >>>> >> > > >> dump
> > >>>> >> > > >> >> > of
> > >>>> >> > > >> >> > > a
> > >>>> >> > > >> >> > > > DB table with a setting of 7 days retention, your
> > >>>> retention
> > >>>> >> > > will
> > >>>> >> > > >> >> > actually
> > >>>> >> > > >> >> > > > not get enforced for the first 7 days because the
> > >>>> data is
> > >>>> >> > "new
> > >>>> >> > > to
> > >>>> >> > > >> >> > Kafka".
> > >>>> >> > > >> >> > > >
> > >>>> >> > > >> >> > > > -Jay
> > >>>> >> > > >> >> > > >
> > >>>> >> > > >> >> > > >
> > >>>> >> > > >> >> > > > On Mon, Sep 7, 2015 at 10:44 AM, Jiangjie Qin
> > >>>> >> > > >> >> > <j...@linkedin.com.invalid
> > >>>> >> > > >> >> > > >
> > >>>> >> > > >> >> > > > wrote:
> > >>>> >> > > >> >> > > >
> > >>>> >> > > >> >> > > > > Jay,
> > >>>> >> > > >> >> > > > >
> > >>>> >> > > >> >> > > > > Thanks for the comments. Yes, there are
> actually
> > >>>> three
> > >>>> >> > > >> proposals as
> > >>>> >> > > >> >> > you
> > >>>> >> > > >> >> > > > > pointed out.
> > >>>> >> > > >> >> > > > >
> > >>>> >> > > >> >> > > > > We will have a separate proposal for (1) -
> > version
> > >>>> >> control
> > >>>> >> > > >> >> mechanism.
> > >>>> >> > > >> >> > > We
> > >>>> >> > > >> >> > > > > actually thought about whether we want to
> > separate
> > >>>> 2 and
> > >>>> >> 3
> > >>>> >> > > >> >> internally
> > >>>> >> > > >> >> > > > > before creating the KIP. The reason we put 2
> and
> > 3
> > >>>> >> together
> > >>>> >> > > is
> > >>>> >> > > >> it
> > >>>> >> > > >> >> > will
> > >>>> >> > > >> >> > > > > saves us another cross board wire protocol
> > change.
> > >>>> Like
> > >>>> >> you
> > >>>> >> > > >> said,
> > >>>> >> > > >> >> we
> > >>>> >> > > >> >> > > have
> > >>>> >> > > >> >> > > > > to migrate all the clients in all languages. To
> > >>>> some
> > >>>> >> > extent,
> > >>>> >> > > the
> > >>>> >> > > >> >> > effort
> > >>>> >> > > >> >> > > > to
> > >>>> >> > > >> >> > > > > spend on upgrading the clients can be even
> bigger
> > >>>> than
> > >>>> >> > > >> implementing
> > >>>> >> > > >> >> > the
> > >>>> >> > > >> >> > > > new
> > >>>> >> > > >> >> > > > > feature itself. So there are some attractions
> if
> > >>>> we can
> > >>>> >> do
> > >>>> >> > 2
> > >>>> >> > > >> and 3
> > >>>> >> > > >> >> > > > together
> > >>>> >> > > >> >> > > > > instead of separately. Maybe after (1) is done
> it
> > >>>> will be
> > >>>> >> > > >> easier to
> > >>>> >> > > >> >> > do
> > >>>> >> > > >> >> > > > > protocol migration. But if we are able to come
> to
> > >>>> an
> > >>>> >> > > agreement
> > >>>> >> > > >> on
> > >>>> >> > > >> >> the
> > >>>> >> > > >> >> > > > > timestamp solution, I would prefer to have it
> > >>>> together
> > >>>> >> with
> > >>>> >> > > >> >> relative
> > >>>> >> > > >> >> > > > offset
> > >>>> >> > > >> >> > > > > in the interest of avoiding another wire
> protocol
> > >>>> change
> > >>>> >> > (the
> > >>>> >> > > >> >> process
> > >>>> >> > > >> >> > > to
> > >>>> >> > > >> >> > > > > migrate to relative offset is exactly the same
> as
> > >>>> migrate
> > >>>> >> > to
> > >>>> >> > > >> >> message
> > >>>> >> > > >> >> > > with
> > >>>> >> > > >> >> > > > > timestamp).
> > >>>> >> > > >> >> > > > >
> > >>>> >> > > >> >> > > > > In terms of timestamp. I completely agree that
> > >>>> having
> > >>>> >> > client
> > >>>> >> > > >> >> > timestamp
> > >>>> >> > > >> >> > > is
> > >>>> >> > > >> >> > > > > more useful if we can make sure the timestamp
> is
> > >>>> good.
> > >>>> >> But
> > >>>> >> > in
> > >>>> >> > > >> >> reality
> > >>>> >> > > >> >> > > > that
> > >>>> >> > > >> >> > > > > can be a really big *IF*. I think the problem
> is
> > >>>> exactly
> > >>>> >> as
> > >>>> >> > > Ewen
> > >>>> >> > > >> >> > > > mentioned,
> > >>>> >> > > >> >> > > > > if we let the client to set the timestamp, it
> > >>>> would be
> > >>>> >> very
> > >>>> >> > > hard
> > >>>> >> > > >> >> for
> > >>>> >> > > >> >> > > the
> > >>>> >> > > >> >> > > > > broker to utilize it. If broker apply retention
> > >>>> policy
> > >>>> >> > based
> > >>>> >> > > on
> > >>>> >> > > >> the
> > >>>> >> > > >> >> > > > client
> > >>>> >> > > >> >> > > > > timestamp. One misbehave producer can
> potentially
> > >>>> >> > completely
> > >>>> >> > > >> mess
> > >>>> >> > > >> >> up
> > >>>> >> > > >> >> > > the
> > >>>> >> > > >> >> > > > > retention policy on the broker. Although people
> > >>>> don't
> > >>>> >> care
> > >>>> >> > > about
> > >>>> >> > > >> >> > server
> > >>>> >> > > >> >> > > > > side timestamp. People do care a lot when
> > timestamp
> > >>>> >> breaks.
> > >>>> >> > > >> >> Searching
> > >>>> >> > > >> >> > > by
> > >>>> >> > > >> >> > > > > timestamp is a really important use case even
> > >>>> though it
> > >>>> >> is
> > >>>> >> > > not
> > >>>> >> > > >> used
> > >>>> >> > > >> >> > as
> > >>>> >> > > >> >> > > > > often as searching by offset. It has
> significant
> > >>>> direct
> > >>>> >> > > impact
> > >>>> >> > > >> on
> > >>>> >> > > >> >> RTO
> > >>>> >> > > >> >> > > > when
> > >>>> >> > > >> >> > > > > there is a cross cluster failover as Todd
> > >>>> mentioned.
> > >>>> >> > > >> >> > > > >
> > >>>> >> > > >> >> > > > > The trick using max(lastAppendedTimestamp,
> > >>>> >> > currentTimeMillis)
> > >>>> >> > > >> is to
> > >>>> >> > > >> >> > > > > guarantee monotonic increase of the timestamp.
> > Many
> > >>>> >> > > commercial
> > >>>> >> > > >> >> system
> > >>>> >> > > >> >> > > > > actually do something similar to this to solve
> > the
> > >>>> time
> > >>>> >> > skew.
> > >>>> >> > > >> About
> > >>>> >> > > >> >> > > > > changing the time, I am not sure if people use
> > NTP
> > >>>> like
> > >>>> >> > > using a
> > >>>> >> > > >> >> watch
> > >>>> >> > > >> >> > > to
> > >>>> >> > > >> >> > > > > just set it forward/backward by an hour or so.
> > The
> > >>>> time
> > >>>> >> > > >> adjustment
> > >>>> >> > > >> >> I
> > >>>> >> > > >> >> > > used
> > >>>> >> > > >> >> > > > > to do is typically to adjust something like a
> > >>>> minute  /
> > >>>> >> > > week. So
> > >>>> >> > > >> >> for
> > >>>> >> > > >> >> > > each
> > >>>> >> > > >> >> > > > > second, there might be a few mircoseconds
> > >>>> slower/faster
> > >>>> >> but
> > >>>> >> > > >> should
> > >>>> >> > > >> >> > not
> > >>>> >> > > >> >> > > > > break the clock completely to make sure all the
> > >>>> >> time-based
> > >>>> >> > > >> >> > transactions
> > >>>> >> > > >> >> > > > are
> > >>>> >> > > >> >> > > > > not affected. The one minute change will be
> done
> > >>>> within a
> > >>>> >> > > week
> > >>>> >> > > >> but
> > >>>> >> > > >> >> > not
> > >>>> >> > > >> >> > > > > instantly.
> > >>>> >> > > >> >> > > > >
> > >>>> >> > > >> >> > > > > Personally, I think having client side
> timestamp
> > >>>> will be
> > >>>> >> > > useful
> > >>>> >> > > >> if
> > >>>> >> > > >> >> we
> > >>>> >> > > >> >> > > > don't
> > >>>> >> > > >> >> > > > > need to put the broker and data integrity under
> > >>>> risk. If
> > >>>> >> we
> > >>>> >> > > >> have to
> > >>>> >> > > >> >> > > > choose
> > >>>> >> > > >> >> > > > > from one of them but not both. I would prefer
> > >>>> server side
> > >>>> >> > > >> timestamp
> > >>>> >> > > >> >> > > > because
> > >>>> >> > > >> >> > > > > for client side timestamp there is always a
> plan
> > B
> > >>>> which
> > >>>> >> is
> > >>>> >> > > >> putting
> > >>>> >> > > >> >> > the
> > >>>> >> > > >> >> > > > > timestamp into payload.
> > >>>> >> > > >> >> > > > >
> > >>>> >> > > >> >> > > > > Another reason I am reluctant to use the client
> > >>>> side
> > >>>> >> > > timestamp
> > >>>> >> > > >> is
> > >>>> >> > > >> >> > that
> > >>>> >> > > >> >> > > it
> > >>>> >> > > >> >> > > > > is always dangerous to mix the control plane
> with
> > >>>> data
> > >>>> >> > > plane. IP
> > >>>> >> > > >> >> did
> > >>>> >> > > >> >> > > this
> > >>>> >> > > >> >> > > > > and it has caused so many different breaches so
> > >>>> people
> > >>>> >> are
> > >>>> >> > > >> >> migrating
> > >>>> >> > > >> >> > to
> > >>>> >> > > >> >> > > > > something like MPLS. An example in Kafka is
> that
> > >>>> any
> > >>>> >> client
> > >>>> >> > > can
> > >>>> >> > > >> >> > > > construct a
> > >>>> >> > > >> >> > > > >
> > >>>> >> > > >>
> > >>>> LeaderAndIsrRequest/UpdateMetadataRequest/ContorlledShutdownRequest
> > >>>> >> > > >> >> > > (you
> > >>>> >> > > >> >> > > > > name it) and send it to the broker to mess up
> the
> > >>>> entire
> > >>>> >> > > >> cluster,
> > >>>> >> > > >> >> > also
> > >>>> >> > > >> >> > > as
> > >>>> >> > > >> >> > > > > we already noticed a busy cluster can respond
> > >>>> quite slow
> > >>>> >> to
> > >>>> >> > > >> >> > controller
> > >>>> >> > > >> >> > > > > messages. So it would really be nice if we can
> > >>>> avoid
> > >>>> >> giving
> > >>>> >> > > the
> > >>>> >> > > >> >> power
> > >>>> >> > > >> >> > > to
> > >>>> >> > > >> >> > > > > clients to control the log retention.
> > >>>> >> > > >> >> > > > >
> > >>>> >> > > >> >> > > > > Thanks,
> > >>>> >> > > >> >> > > > >
> > >>>> >> > > >> >> > > > > Jiangjie (Becket) Qin
> > >>>> >> > > >> >> > > > >
> > >>>> >> > > >> >> > > > >
> > >>>> >> > > >> >> > > > > On Sun, Sep 6, 2015 at 9:54 PM, Todd Palino <
> > >>>> >> > > tpal...@gmail.com>
> > >>>> >> > > >> >> > wrote:
> > >>>> >> > > >> >> > > > >
> > >>>> >> > > >> >> > > > > > So, with regards to why you want to search by
> > >>>> >> timestamp,
> > >>>> >> > > the
> > >>>> >> > > >> >> > biggest
> > >>>> >> > > >> >> > > > > > problem I've seen is with consumers who want
> to
> > >>>> reset
> > >>>> >> > their
> > >>>> >> > > >> >> > > timestamps
> > >>>> >> > > >> >> > > > > to a
> > >>>> >> > > >> >> > > > > > specific point, whether it is to replay a
> > certain
> > >>>> >> amount
> > >>>> >> > of
> > >>>> >> > > >> >> > messages,
> > >>>> >> > > >> >> > > > or
> > >>>> >> > > >> >> > > > > to
> > >>>> >> > > >> >> > > > > > rewind to before some problem state existed.
> > This
> > >>>> >> happens
> > >>>> >> > > more
> > >>>> >> > > >> >> > often
> > >>>> >> > > >> >> > > > than
> > >>>> >> > > >> >> > > > > > anyone would like.
> > >>>> >> > > >> >> > > > > >
> > >>>> >> > > >> >> > > > > > To handle this now we need to constantly
> export
> > >>>> the
> > >>>> >> > > broker's
> > >>>> >> > > >> >> offset
> > >>>> >> > > >> >> > > for
> > >>>> >> > > >> >> > > > > > every partition to a time-series database and
> > >>>> then use
> > >>>> >> > > >> external
> > >>>> >> > > >> >> > > > processes
> > >>>> >> > > >> >> > > > > > to query this. I know we're not the only ones
> > >>>> doing
> > >>>> >> this.
> > >>>> >> > > The
> > >>>> >> > > >> way
> > >>>> >> > > >> >> > the
> > >>>> >> > > >> >> > > > > > broker handles requests for offsets by
> > timestamp
> > >>>> is a
> > >>>> >> > > little
> > >>>> >> > > >> >> obtuse
> > >>>> >> > > >> >> > > > > > (explain it to anyone without intimate
> > knowledge
> > >>>> of the
> > >>>> >> > > >> internal
> > >>>> >> > > >> >> > > > workings
> > >>>> >> > > >> >> > > > > > of the broker - every time I do I see this).
> In
> > >>>> >> addition,
> > >>>> >> > > as
> > >>>> >> > > >> >> Becket
> > >>>> >> > > >> >> > > > > pointed
> > >>>> >> > > >> >> > > > > > out, it causes problems specifically with
> > >>>> retention of
> > >>>> >> > > >> messages
> > >>>> >> > > >> >> by
> > >>>> >> > > >> >> > > time
> > >>>> >> > > >> >> > > > > > when you move partitions around.
> > >>>> >> > > >> >> > > > > >
> > >>>> >> > > >> >> > > > > > I'm deliberately avoiding the discussion of
> > what
> > >>>> >> > timestamp
> > >>>> >> > > to
> > >>>> >> > > >> >> use.
> > >>>> >> > > >> >> > I
> > >>>> >> > > >> >> > > > can
> > >>>> >> > > >> >> > > > > > see the argument either way, though I tend to
> > >>>> lean
> > >>>> >> > towards
> > >>>> >> > > the
> > >>>> >> > > >> >> idea
> > >>>> >> > > >> >> > > > that
> > >>>> >> > > >> >> > > > > > the broker timestamp is the only viable
> source
> > >>>> of truth
> > >>>> >> > in
> > >>>> >> > > >> this
> > >>>> >> > > >> >> > > > > situation.
> > >>>> >> > > >> >> > > > > >
> > >>>> >> > > >> >> > > > > > -Todd
> > >>>> >> > > >> >> > > > > >
> > >>>> >> > > >> >> > > > > >
> > >>>> >> > > >> >> > > > > > On Sun, Sep 6, 2015 at 7:08 PM, Ewen
> > >>>> Cheslack-Postava <
> > >>>> >> > > >> >> > > > e...@confluent.io
> > >>>> >> > > >> >> > > > > >
> > >>>> >> > > >> >> > > > > > wrote:
> > >>>> >> > > >> >> > > > > >
> > >>>> >> > > >> >> > > > > > > On Sun, Sep 6, 2015 at 4:57 PM, Jay Kreps <
> > >>>> >> > > j...@confluent.io
> > >>>> >> > > >> >
> > >>>> >> > > >> >> > > wrote:
> > >>>> >> > > >> >> > > > > > >
> > >>>> >> > > >> >> > > > > > > >
> > >>>> >> > > >> >> > > > > > > > 2. Nobody cares what time it is on the
> > >>>> server.
> > >>>> >> > > >> >> > > > > > > >
> > >>>> >> > > >> >> > > > > > >
> > >>>> >> > > >> >> > > > > > > This is a good way of summarizing the
> issue I
> > >>>> was
> > >>>> >> > trying
> > >>>> >> > > to
> > >>>> >> > > >> get
> > >>>> >> > > >> >> > at,
> > >>>> >> > > >> >> > > > > from
> > >>>> >> > > >> >> > > > > > an
> > >>>> >> > > >> >> > > > > > > app's perspective. Of the 3 stated goals of
> > >>>> the KIP,
> > >>>> >> #2
> > >>>> >> > > (lot
> > >>>> >> > > >> >> > > > retention)
> > >>>> >> > > >> >> > > > > > is
> > >>>> >> > > >> >> > > > > > > reasonably handled by a server-side
> > timestamp.
> > >>>> I
> > >>>> >> really
> > >>>> >> > > just
> > >>>> >> > > >> >> care
> > >>>> >> > > >> >> > > > that
> > >>>> >> > > >> >> > > > > a
> > >>>> >> > > >> >> > > > > > > message is there long enough that I have a
> > >>>> chance to
> > >>>> >> > > process
> > >>>> >> > > >> >> it.
> > >>>> >> > > >> >> > #3
> > >>>> >> > > >> >> > > > > > > (searching by timestamp) only seems useful
> if
> > >>>> we can
> > >>>> >> > > >> guarantee
> > >>>> >> > > >> >> > the
> > >>>> >> > > >> >> > > > > > > server-side timestamp is close enough to
> the
> > >>>> original
> > >>>> >> > > >> >> client-side
> > >>>> >> > > >> >> > > > > > > timestamp, and any mirror maker step seems
> to
> > >>>> break
> > >>>> >> > that
> > >>>> >> > > >> (even
> > >>>> >> > > >> >> > > > ignoring
> > >>>> >> > > >> >> > > > > > any
> > >>>> >> > > >> >> > > > > > > issues with broker availability).
> > >>>> >> > > >> >> > > > > > >
> > >>>> >> > > >> >> > > > > > > I'm also wondering whether optimizing for
> > >>>> >> > > >> search-by-timestamp
> > >>>> >> > > >> >> on
> > >>>> >> > > >> >> > > the
> > >>>> >> > > >> >> > > > > > broker
> > >>>> >> > > >> >> > > > > > > is really something we want to do given
> that
> > >>>> messages
> > >>>> >> > > aren't
> > >>>> >> > > >> >> > really
> > >>>> >> > > >> >> > > > > > > guaranteed to be ordered by
> application-level
> > >>>> >> > timestamps
> > >>>> >> > > on
> > >>>> >> > > >> the
> > >>>> >> > > >> >> > > > broker.
> > >>>> >> > > >> >> > > > > > Is
> > >>>> >> > > >> >> > > > > > > part of the need for this just due to the
> > >>>> current
> > >>>> >> > > consumer
> > >>>> >> > > >> APIs
> > >>>> >> > > >> >> > > being
> > >>>> >> > > >> >> > > > > > > difficult to work with? For example, could
> > you
> > >>>> >> > implement
> > >>>> >> > > >> this
> > >>>> >> > > >> >> > > pretty
> > >>>> >> > > >> >> > > > > > easily
> > >>>> >> > > >> >> > > > > > > client side just the way you would
> > >>>> broker-side? I'd
> > >>>> >> > > imagine
> > >>>> >> > > >> a
> > >>>> >> > > >> >> > > couple
> > >>>> >> > > >> >> > > > of
> > >>>> >> > > >> >> > > > > > > random seeks + reads during very rare
> > >>>> occasions (i.e.
> > >>>> >> > > when
> > >>>> >> > > >> the
> > >>>> >> > > >> >> > app
> > >>>> >> > > >> >> > > > > starts
> > >>>> >> > > >> >> > > > > > > up) wouldn't be a problem performance-wise.
> > Or
> > >>>> is it
> > >>>> >> > also
> > >>>> >> > > >> that
> > >>>> >> > > >> >> > you
> > >>>> >> > > >> >> > > > need
> > >>>> >> > > >> >> > > > > > the
> > >>>> >> > > >> >> > > > > > > broker to enforce things like monotonically
> > >>>> >> increasing
> > >>>> >> > > >> >> timestamps
> > >>>> >> > > >> >> > > > since
> > >>>> >> > > >> >> > > > > > you
> > >>>> >> > > >> >> > > > > > > can't do the query properly and efficiently
> > >>>> without
> > >>>> >> > that
> > >>>> >> > > >> >> > guarantee,
> > >>>> >> > > >> >> > > > and
> > >>>> >> > > >> >> > > > > > > therefore what applications are actually
> > >>>> looking for
> > >>>> >> > *is*
> > >>>> >> > > >> >> > > broker-side
> > >>>> >> > > >> >> > > > > > > timestamps?
> > >>>> >> > > >> >> > > > > > >
> > >>>> >> > > >> >> > > > > > > -Ewen
> > >>>> >> > > >> >> > > > > > >
> > >>>> >> > > >> >> > > > > > >
> > >>>> >> > > >> >> > > > > > >
> > >>>> >> > > >> >> > > > > > > > Consider cases where data is being copied
> > >>>> from a
> > >>>> >> > > database
> > >>>> >> > > >> or
> > >>>> >> > > >> >> > from
> > >>>> >> > > >> >> > > > log
> > >>>> >> > > >> >> > > > > > > > files. In steady-state the server time is
> > >>>> very
> > >>>> >> close
> > >>>> >> > to
> > >>>> >> > > >> the
> > >>>> >> > > >> >> > > client
> > >>>> >> > > >> >> > > > > time
> > >>>> >> > > >> >> > > > > > > if
> > >>>> >> > > >> >> > > > > > > > their clocks are sync'd (see 1) but there
> > >>>> will be
> > >>>> >> > > times of
> > >>>> >> > > >> >> > large
> > >>>> >> > > >> >> > > > > > > divergence
> > >>>> >> > > >> >> > > > > > > > when the copying process is stopped or
> > falls
> > >>>> >> behind.
> > >>>> >> > > When
> > >>>> >> > > >> >> this
> > >>>> >> > > >> >> > > > occurs
> > >>>> >> > > >> >> > > > > > it
> > >>>> >> > > >> >> > > > > > > is
> > >>>> >> > > >> >> > > > > > > > clear that the time the data arrived on
> the
> > >>>> server
> > >>>> >> is
> > >>>> >> > > >> >> > irrelevant,
> > >>>> >> > > >> >> > > > it
> > >>>> >> > > >> >> > > > > is
> > >>>> >> > > >> >> > > > > > > the
> > >>>> >> > > >> >> > > > > > > > source timestamp that matters. This is
> the
> > >>>> problem
> > >>>> >> > you
> > >>>> >> > > are
> > >>>> >> > > >> >> > trying
> > >>>> >> > > >> >> > > > to
> > >>>> >> > > >> >> > > > > > fix
> > >>>> >> > > >> >> > > > > > > by
> > >>>> >> > > >> >> > > > > > > > retaining the mm timestamp but really the
> > >>>> client
> > >>>> >> > should
> > >>>> >> > > >> >> always
> > >>>> >> > > >> >> > > set
> > >>>> >> > > >> >> > > > > the
> > >>>> >> > > >> >> > > > > > > time
> > >>>> >> > > >> >> > > > > > > > with the use of server-side time as a
> > >>>> fallback. It
> > >>>> >> > > would
> > >>>> >> > > >> be
> > >>>> >> > > >> >> > worth
> > >>>> >> > > >> >> > > > > > talking
> > >>>> >> > > >> >> > > > > > > > to the Samza folks and reading through
> this
> > >>>> blog
> > >>>> >> > post (
> > >>>> >> > > >> >> > > > > > > >
> > >>>> >> > > >> >> > > > > > >
> > >>>> >> > > >> >> > > > > >
> > >>>> >> > > >> >> > > > >
> > >>>> >> > > >> >> > > >
> > >>>> >> > > >> >> > >
> > >>>> >> > > >> >> >
> > >>>> >> > > >> >>
> > >>>> >> > > >>
> > >>>> >> > >
> > >>>> >> >
> > >>>> >>
> > >>>>
> >
> http://radar.oreilly.com/2015/08/the-world-beyond-batch-streaming-101.html
> > >>>> >> > > >> >> > > > > > > > )
> > >>>> >> > > >> >> > > > > > > > on this subject since we went through
> > similar
> > >>>> >> > > learnings on
> > >>>> >> > > >> >> the
> > >>>> >> > > >> >> > > > stream
> > >>>> >> > > >> >> > > > > > > > processing side.
> > >>>> >> > > >> >> > > > > > > >
> > >>>> >> > > >> >> > > > > > > > I think the implication of these two is
> > that
> > >>>> we
> > >>>> >> need
> > >>>> >> > a
> > >>>> >> > > >> >> proposal
> > >>>> >> > > >> >> > > > that
> > >>>> >> > > >> >> > > > > > > > handles potentially very out-of-order
> > >>>> timestamps in
> > >>>> >> > > some
> > >>>> >> > > >> kind
> > >>>> >> > > >> >> > of
> > >>>> >> > > >> >> > > > > sanish
> > >>>> >> > > >> >> > > > > > > way
> > >>>> >> > > >> >> > > > > > > > (buggy clients will set something totally
> > >>>> wrong as
> > >>>> >> > the
> > >>>> >> > > >> time).
> > >>>> >> > > >> >> > > > > > > >
> > >>>> >> > > >> >> > > > > > > > -Jay
> > >>>> >> > > >> >> > > > > > > >
> > >>>> >> > > >> >> > > > > > > > On Sun, Sep 6, 2015 at 4:22 PM, Jay
> Kreps <
> > >>>> >> > > >> j...@confluent.io>
> > >>>> >> > > >> >> > > > wrote:
> > >>>> >> > > >> >> > > > > > > >
> > >>>> >> > > >> >> > > > > > > > > The magic byte is used to version
> message
> > >>>> format
> > >>>> >> so
> > >>>> >> > > >> we'll
> > >>>> >> > > >> >> > need
> > >>>> >> > > >> >> > > to
> > >>>> >> > > >> >> > > > > > make
> > >>>> >> > > >> >> > > > > > > > > sure that check is in place--I actually
> > >>>> don't see
> > >>>> >> > it
> > >>>> >> > > in
> > >>>> >> > > >> the
> > >>>> >> > > >> >> > > > current
> > >>>> >> > > >> >> > > > > > > > > consumer code which I think is a bug we
> > >>>> should
> > >>>> >> fix
> > >>>> >> > > for
> > >>>> >> > > >> the
> > >>>> >> > > >> >> > next
> > >>>> >> > > >> >> > > > > > release
> > >>>> >> > > >> >> > > > > > > > > (filed KAFKA-2523). The purpose of that
> > >>>> field is
> > >>>> >> so
> > >>>> >> > > >> there
> > >>>> >> > > >> >> is
> > >>>> >> > > >> >> > a
> > >>>> >> > > >> >> > > > > clear
> > >>>> >> > > >> >> > > > > > > > check
> > >>>> >> > > >> >> > > > > > > > > on the format rather than the scrambled
> > >>>> scenarios
> > >>>> >> > > Becket
> > >>>> >> > > >> >> > > > describes.
> > >>>> >> > > >> >> > > > > > > > >
> > >>>> >> > > >> >> > > > > > > > > Also, Becket, I don't think just fixing
> > >>>> the java
> > >>>> >> > > client
> > >>>> >> > > >> is
> > >>>> >> > > >> >> > > > > sufficient
> > >>>> >> > > >> >> > > > > > > as
> > >>>> >> > > >> >> > > > > > > > > that would break other clients--i.e. if
> > >>>> anyone
> > >>>> >> > > writes a
> > >>>> >> > > >> v1
> > >>>> >> > > >> >> > > > > messages,
> > >>>> >> > > >> >> > > > > > > even
> > >>>> >> > > >> >> > > > > > > > > by accident, any non-v1-capable
> consumer
> > >>>> will
> > >>>> >> > break.
> > >>>> >> > > I
> > >>>> >> > > >> >> think
> > >>>> >> > > >> >> > we
> > >>>> >> > > >> >> > > > > > > probably
> > >>>> >> > > >> >> > > > > > > > > need a way to have the server ensure a
> > >>>> particular
> > >>>> >> > > >> message
> > >>>> >> > > >> >> > > format
> > >>>> >> > > >> >> > > > > > either
> > >>>> >> > > >> >> > > > > > > > at
> > >>>> >> > > >> >> > > > > > > > > read or write time.
> > >>>> >> > > >> >> > > > > > > > >
> > >>>> >> > > >> >> > > > > > > > > -Jay
> > >>>> >> > > >> >> > > > > > > > >
> > >>>> >> > > >> >> > > > > > > > > On Thu, Sep 3, 2015 at 3:47 PM,
> Jiangjie
> > >>>> Qin
> > >>>> >> > > >> >> > > > > > <j...@linkedin.com.invalid
> > >>>> >> > > >> >> > > > > > > >
> > >>>> >> > > >> >> > > > > > > > > wrote:
> > >>>> >> > > >> >> > > > > > > > >
> > >>>> >> > > >> >> > > > > > > > >> Hi Guozhang,
> > >>>> >> > > >> >> > > > > > > > >>
> > >>>> >> > > >> >> > > > > > > > >> I checked the code again. Actually CRC
> > >>>> check
> > >>>> >> > > probably
> > >>>> >> > > >> >> won't
> > >>>> >> > > >> >> > > > fail.
> > >>>> >> > > >> >> > > > > > The
> > >>>> >> > > >> >> > > > > > > > >> newly
> > >>>> >> > > >> >> > > > > > > > >> added timestamp field might be treated
> > as
> > >>>> >> > keyLength
> > >>>> >> > > >> >> instead,
> > >>>> >> > > >> >> > > so
> > >>>> >> > > >> >> > > > we
> > >>>> >> > > >> >> > > > > > are
> > >>>> >> > > >> >> > > > > > > > >> likely to receive an
> > >>>> IllegalArgumentException
> > >>>> >> when
> > >>>> >> > > try
> > >>>> >> > > >> to
> > >>>> >> > > >> >> > read
> > >>>> >> > > >> >> > > > the
> > >>>> >> > > >> >> > > > > > > key.
> > >>>> >> > > >> >> > > > > > > > >> I'll update the KIP.
> > >>>> >> > > >> >> > > > > > > > >>
> > >>>> >> > > >> >> > > > > > > > >> Thanks,
> > >>>> >> > > >> >> > > > > > > > >>
> > >>>> >> > > >> >> > > > > > > > >> Jiangjie (Becket) Qin
> > >>>> >> > > >> >> > > > > > > > >>
> > >>>> >> > > >> >> > > > > > > > >> On Thu, Sep 3, 2015 at 12:48 PM,
> > Jiangjie
> > >>>> Qin <
> > >>>> >> > > >> >> > > > j...@linkedin.com>
> > >>>> >> > > >> >> > > > > > > > wrote:
> > >>>> >> > > >> >> > > > > > > > >>
> > >>>> >> > > >> >> > > > > > > > >> > Hi, Guozhang,
> > >>>> >> > > >> >> > > > > > > > >> >
> > >>>> >> > > >> >> > > > > > > > >> > Thanks for reading the KIP. By "old
> > >>>> >> consumer", I
> > >>>> >> > > >> meant
> > >>>> >> > > >> >> the
> > >>>> >> > > >> >> > > > > > > > >> > ZookeeperConsumerConnector in trunk
> > >>>> now, i.e.
> > >>>> >> > > without
> > >>>> >> > > >> >> this
> > >>>> >> > > >> >> > > bug
> > >>>> >> > > >> >> > > > > > > fixed.
> > >>>> >> > > >> >> > > > > > > > >> If we
> > >>>> >> > > >> >> > > > > > > > >> > fix the ZookeeperConsumerConnector
> > then
> > >>>> it
> > >>>> >> will
> > >>>> >> > > throw
> > >>>> >> > > >> >> > > > exception
> > >>>> >> > > >> >> > > > > > > > >> complaining
> > >>>> >> > > >> >> > > > > > > > >> > about the unsupported version when
> it
> > >>>> sees
> > >>>> >> > message
> > >>>> >> > > >> >> format
> > >>>> >> > > >> >> > > V1.
> > >>>> >> > > >> >> > > > > > What I
> > >>>> >> > > >> >> > > > > > > > was
> > >>>> >> > > >> >> > > > > > > > >> > trying to say is that if we have
> some
> > >>>> >> > > >> >> > > > ZookeeperConsumerConnector
> > >>>> >> > > >> >> > > > > > > > running
> > >>>> >> > > >> >> > > > > > > > >> > without the fix, the consumer will
> > >>>> complain
> > >>>> >> > about
> > >>>> >> > > CRC
> > >>>> >> > > >> >> > > mismatch
> > >>>> >> > > >> >> > > > > > > instead
> > >>>> >> > > >> >> > > > > > > > >> of
> > >>>> >> > > >> >> > > > > > > > >> > unsupported version.
> > >>>> >> > > >> >> > > > > > > > >> >
> > >>>> >> > > >> >> > > > > > > > >> > Thanks,
> > >>>> >> > > >> >> > > > > > > > >> >
> > >>>> >> > > >> >> > > > > > > > >> > Jiangjie (Becket) Qin
> > >>>> >> > > >> >> > > > > > > > >> >
> > >>>> >> > > >> >> > > > > > > > >> > On Thu, Sep 3, 2015 at 12:15 PM,
> > >>>> Guozhang
> > >>>> >> Wang <
> > >>>> >> > > >> >> > > > > > wangg...@gmail.com>
> > >>>> >> > > >> >> > > > > > > > >> wrote:
> > >>>> >> > > >> >> > > > > > > > >> >
> > >>>> >> > > >> >> > > > > > > > >> >> Thanks for the write-up Jiangjie.
> > >>>> >> > > >> >> > > > > > > > >> >>
> > >>>> >> > > >> >> > > > > > > > >> >> One comment about migration plan:
> > "For
> > >>>> old
> > >>>> >> > > >> consumers,
> > >>>> >> > > >> >> if
> > >>>> >> > > >> >> > > they
> > >>>> >> > > >> >> > > > > see
> > >>>> >> > > >> >> > > > > > > the
> > >>>> >> > > >> >> > > > > > > > >> new
> > >>>> >> > > >> >> > > > > > > > >> >> protocol the CRC check will fail"..
> > >>>> >> > > >> >> > > > > > > > >> >>
> > >>>> >> > > >> >> > > > > > > > >> >> Do you mean this bug in the old
> > >>>> consumer
> > >>>> >> cannot
> > >>>> >> > > be
> > >>>> >> > > >> >> fixed
> > >>>> >> > > >> >> > > in a
> > >>>> >> > > >> >> > > > > > > > >> >> backward-compatible way?
> > >>>> >> > > >> >> > > > > > > > >> >>
> > >>>> >> > > >> >> > > > > > > > >> >> Guozhang
> > >>>> >> > > >> >> > > > > > > > >> >>
> > >>>> >> > > >> >> > > > > > > > >> >>
> > >>>> >> > > >> >> > > > > > > > >> >> On Thu, Sep 3, 2015 at 8:35 AM,
> > >>>> Jiangjie Qin
> > >>>> >> > > >> >> > > > > > > > <j...@linkedin.com.invalid
> > >>>> >> > > >> >> > > > > > > > >> >
> > >>>> >> > > >> >> > > > > > > > >> >> wrote:
> > >>>> >> > > >> >> > > > > > > > >> >>
> > >>>> >> > > >> >> > > > > > > > >> >> > Hi,
> > >>>> >> > > >> >> > > > > > > > >> >> >
> > >>>> >> > > >> >> > > > > > > > >> >> > We just created KIP-31 to
> propose a
> > >>>> message
> > >>>> >> > > format
> > >>>> >> > > >> >> > change
> > >>>> >> > > >> >> > > > in
> > >>>> >> > > >> >> > > > > > > Kafka.
> > >>>> >> > > >> >> > > > > > > > >> >> >
> > >>>> >> > > >> >> > > > > > > > >> >> >
> > >>>> >> > > >> >> > > > > > > > >> >> >
> > >>>> >> > > >> >> > > > > > > > >> >>
> > >>>> >> > > >> >> > > > > > > > >>
> > >>>> >> > > >> >> > > > > > > >
> > >>>> >> > > >> >> > > > > > >
> > >>>> >> > > >> >> > > > > >
> > >>>> >> > > >> >> > > > >
> > >>>> >> > > >> >> > > >
> > >>>> >> > > >> >> > >
> > >>>> >> > > >> >> >
> > >>>> >> > > >> >>
> > >>>> >> > > >>
> > >>>> >> > >
> > >>>> >> >
> > >>>> >>
> > >>>>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-31+-+Message+format+change+proposal
> > >>>> >> > > >> >> > > > > > > > >> >> >
> > >>>> >> > > >> >> > > > > > > > >> >> > As a summary, the motivations
> are:
> > >>>> >> > > >> >> > > > > > > > >> >> > 1. Avoid server side message
> > >>>> re-compression
> > >>>> >> > > >> >> > > > > > > > >> >> > 2. Honor time-based log roll and
> > >>>> retention
> > >>>> >> > > >> >> > > > > > > > >> >> > 3. Enable offset search by
> > timestamp
> > >>>> at a
> > >>>> >> > finer
> > >>>> >> > > >> >> > > > granularity.
> > >>>> >> > > >> >> > > > > > > > >> >> >
> > >>>> >> > > >> >> > > > > > > > >> >> > Feedback and comments are
> welcome!
> > >>>> >> > > >> >> > > > > > > > >> >> >
> > >>>> >> > > >> >> > > > > > > > >> >> > Thanks,
> > >>>> >> > > >> >> > > > > > > > >> >> >
> > >>>> >> > > >> >> > > > > > > > >> >> > Jiangjie (Becket) Qin
> > >>>> >> > > >> >> > > > > > > > >> >> >
> > >>>> >> > > >> >> > > > > > > > >> >>
> > >>>> >> > > >> >> > > > > > > > >> >>
> > >>>> >> > > >> >> > > > > > > > >> >>
> > >>>> >> > > >> >> > > > > > > > >> >> --
> > >>>> >> > > >> >> > > > > > > > >> >> -- Guozhang
> > >>>> >> > > >> >> > > > > > > > >> >>
> > >>>> >> > > >> >> > > > > > > > >> >
> > >>>> >> > > >> >> > > > > > > > >> >
> > >>>> >> > > >> >> > > > > > > > >>
> > >>>> >> > > >> >> > > > > > > > >
> > >>>> >> > > >> >> > > > > > > > >
> > >>>> >> > > >> >> > > > > > > >
> > >>>> >> > > >> >> > > > > > >
> > >>>> >> > > >> >> > > > > > >
> > >>>> >> > > >> >> > > > > > >
> > >>>> >> > > >> >> > > > > > > --
> > >>>> >> > > >> >> > > > > > > Thanks,
> > >>>> >> > > >> >> > > > > > > Ewen
> > >>>> >> > > >> >> > > > > > >
> > >>>> >> > > >> >> > > > > >
> > >>>> >> > > >> >> > > > >
> > >>>> >> > > >> >> > > >
> > >>>> >> > > >> >> > >
> > >>>> >> > > >> >> > >
> > >>>> >> > > >> >> > >
> > >>>> >> > > >> >> > > --
> > >>>> >> > > >> >> > > Thanks,
> > >>>> >> > > >> >> > > Neha
> > >>>> >> > > >> >> > >
> > >>>> >> > > >> >> >
> > >>>> >> > > >> >>
> > >>>> >> > > >>
> > >>>> >> > >
> > >>>> >> >
> > >>>> >>
> > >>>> >>
> > >>>> >>
> > >>>> >> --
> > >>>> >> Thanks,
> > >>>> >> Ewen
> > >>>> >>
> > >>>>
> > >>>>
> > >>>
> > >>
> > >
> >
>

Re: [DISCUSS] KIP-31 - Message format change proposal

Reply via email to