Hi, Jiangjie,

11. Rebuilding all missing time indexes will make the upgrade process
longer since the log segments pre 0.10.0 don't have the time index. Could
we only rebuild the missing indexes after the last flush offset? For other
segments missing the time index, we just assume lastTimestamp to be -1?

Thanks,

Jun

On Wed, Apr 13, 2016 at 9:55 AM, Becket Qin <becket....@gmail.com> wrote:

> Hi Jun and Guozhang,
>
> I have updated the KIP wiki to incorporate your comments. Please let me
> know if you prefer starting another discussion thread for further
> discussion.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Apr 11, 2016 at 12:21 AM, Becket Qin <becket....@gmail.com> wrote:
>
> > Hi Guozhang and Jun,
> >
> > Thanks for the comments. Please see the responses below.
> >
> > Regarding to Guozhang's question #1 and Jun's question #12. I was
> > inserting the time index and offset index entry together mostly for
> > simplicity as Guozhang mentioned. The purpose of using index interval
> bytes
> > for time index was to control the density of the time index, which is the
> > same purpose as offset index. It seems reasonable to make them aligned.
> We
> > can track separately the physical position when we insert the last time
> > index entry(my original code did that), but when I wrote the code I feel
> it
> > seems unnecessary. Another minor benefit is that searching by timestamp
> > could be potentially faster if we align the time index and offset index.
> > It is possible that we only have either a corrupted time index or an
> > offset index, but not both. Although we can choose to only rebuild the
> one
> > which is corrupted, given that we have to scan the entire log segment
> > anyway, rebuilding both of them seems not much overhead. So the current
> > patch I have is rebuilding both of them together.
> >
> > 10. Yes, it should only happen after a hard failure. The last time index
> > entry of a normally closed segment has already points to the LEO, so
> there
> > is no scan during start up.
> >
> > 11. On broker startup, if a time index does not exist, an empty one will
> > be created first. If message format version is 0.9.0, we will append a
> time
> > index entry of (last modification time -> base offset of next segment) to
> > the time index of each inactive segment. So no actual rebuild will happen
> > during upgrade. However, if message format version is 0.10.0, we will
> > rebuild the time index if it does not exist. (I actually had a question
> > about the how we are loading the log segments, we can discuss it in the
> PR)
> >
> > I will update the wiki to clarify the question raised in the comments and
> > submit a PR by tomorrow. I am currently cleaning up the documentation.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> >
> > On Sun, Apr 10, 2016 at 9:25 PM, Jun Rao <j...@confluent.io> wrote:
> >
> >> Hi, Jiangjie,
> >>
> >> Thanks for the update. Looks good to me overall. Just a few minor
> comments
> >> below.
> >>
> >> 10. On broker startup, it's not clear to me why we need to scan the log
> >> segment to retrieve the largest timestamp since the time index always
> has
> >> an entry for the largest timestamp. Is that only for restarting after a
> >> hard failure?
> >>
> >> 11. On broker startup, if a log segment misses the time index, do we
> >> always
> >> rebuild it? This can happen when the broker is upgraded.
> >>
> >> 12. Related to Guozhang's question #1. It seems it's simpler to add time
> >> index entries independent of the offset index since at index entry may
> not
> >> be added to the offset and the time index at the same time. Also, this
> >> allows time index to be rebuilt independently if needed.
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >>
> >> On Wed, Apr 6, 2016 at 5:44 PM, Becket Qin <becket....@gmail.com>
> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I updated KIP-33 based on the initial implementation. Per discussion
> on
> >> > yesterday's KIP hangout, I would like to initiate the new vote thread
> >> for
> >> > KIP-33.
> >> >
> >> > The KIP wiki:
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-33+-+Add+a+time+based+log+index
> >> >
> >> > Here is a brief summary of the KIP:
> >> > 1. We propose to add a time index for each log segment.
> >> > 2. The time indices are going to be used of log retention, log rolling
> >> and
> >> > message search by timestamp.
> >> >
> >> > There was an old voting thread which has some discussions on this KIP.
> >> The
> >> > mail thread link is following:
> >> >
> >> >
> >>
> http://mail-archives.apache.org/mod_mbox/kafka-dev/201602.mbox/%3ccabtagwgoebukyapfpchmycjk2tepq3ngtuwnhtr2tjvsnc8...@mail.gmail.com%3E
> >> >
> >> > I have the following WIP patch for reference. It needs a few more unit
> >> > tests and documentation. Other than that it should run fine.
> >> >
> >> >
> >>
> https://github.com/becketqin/kafka/commit/712357a3fbf1423e05f9eed7d2fed5b6fe6c37b7
> >> >
> >> > Thanks,
> >> >
> >> > Jiangjie (Becket) Qin
> >> >
> >>
> >
> >
>

Reply via email to