Re: [VOTE] KIP-33 - Add a time based log index to Kafka

Guozhang Wang Thu, 25 Feb 2016 15:30:59 -0800

Jiangjie,

I was originally only thinking about the "time.index.size.max.bytes" config
in addition to the "offset.index.size.max.bytes". Since the latter's
default size is 10MB, and for memory mapped file, we will allocate that
much of memory at the start which could be a pressure on RAM if we double
it.


Guozhang

On Wed, Feb 24, 2016 at 4:56 PM, Becket Qin <[email protected]> wrote:

> Hi Guozhang,
>
> I thought about this again and it seems we stilll need the
> time.index.interval.ms configuration to avoid unnecessary frequent time
> index insertion.
>
> I just updated the wiki to add index.interval.bytes as an additional
> constraints for time index entry insertion. Another slight change made was
> that as long as a message timestamp shows time.index.interval.ms has
> passed
> since the timestamp of last time index entry, we will insert another
> timestmap index entry. Previously we always insert time index at
> time.index.interval.ms bucket boundaries.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Wed, Feb 24, 2016 at 2:40 PM, Becket Qin <[email protected]> wrote:
>
> > Thanks for the comment Guozhang,
> >
> > I just changed the configuration name to "time.index.interval.ms".
> >
> > It seems the real question here is how big the offset indices will be.
> > Theoretically we can have one time index entry for each message in a log
> > segment. For example, if there is one event per minute appended, we might
> > have to have a time index entry for each message until the segment size
> is
> > reached. In that case, the number of index entries in the time index
> would
> > be (segment size / avg message size). So the time index file size can
> > potentially be big.
> >
> > I am wondering if we can simply reuse the "index.interval.bytes"
> > configuration instead of having a separate time index interval ms. i.e.
> > instead of inserting a new entry based on time interval, we still insert
> it
> > based on bytes interval. This does not affect the granularity because we
> > can search from the nearest index entry to find the message with correct
> > timestamp. The good thing is that this guarantees there will not be huge
> > time indices. We also save the new configuration.
> >
> > What do you think?
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Wed, Feb 24, 2016 at 1:00 PM, Guozhang Wang <[email protected]>
> wrote:
> >
> >> Thanks Jiangjie, a few comments on the wiki:
> >>
> >> 1. Config name "time.index.interval" to "time.index.interval.ms" to be
> >> consistent. Also do we need a "time.index.size.max.bytes" as well?
> >>
> >> 2. Will the memory mapped index file for timestamp have the same default
> >> initial / max size (10485760) as the offset index?
> >>
> >> Otherwise LGTM.
> >>
> >> Guozhang
> >>
> >> On Tue, Feb 23, 2016 at 5:05 PM, Becket Qin <[email protected]>
> wrote:
> >>
> >> > Bump.
> >> >
> >> > Per Jun's comments during KIP hangout, I have updated wiki with the
> >> upgrade
> >> > plan or KIP-33.
> >> >
> >> > Let's vote!
> >> >
> >> > Thanks,
> >> >
> >> > Jiangjie (Becket) Qin
> >> >
> >> > On Wed, Feb 3, 2016 at 10:32 AM, Becket Qin <[email protected]>
> >> wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > > I would like to initiate the vote for KIP-33.
> >> > >
> >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-33
> >> > > +-+Add+a+time+based+log+index
> >> > >
> >> > > A good amount of the KIP has been touched during the discussion on
> >> > KIP-32.
> >> > > So I also put the link to KIP-32 here for reference.
> >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP
> >> > > -32+-+Add+timestamps+to+Kafka+message
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Jiangjie (Becket) Qin
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >
> >
>



-- 
-- Guozhang

Re: [VOTE] KIP-33 - Add a time based log index to Kafka

Reply via email to