Hi Guozhang,

The size of memory mapped index file was also our concern as well. That is
why we are suggesting minute level time indexing instead of second level.
There are a few thoughts on the extra memory cost of time index.

1. Currently all the index files are loaded as memory mapped files. Notice
that only the index of the active segment is of the default size 10MB.
Typically the index of the old segments are much smaller than 10MB. So if
we use the same initial size for time index files, the total amount of
memory won't be doubled, but the memory cost of active segments will be
doubled. (However, the 10MB value itself seems problematic, see later
reasoning).

2. It is likely that the time index is much smaller than the offset index
because user would adjust the time index interval ms depending on the topic
volume. i.e for a low volume topic the time index interval ms will be much
longer so that we can avoid inserting one time index entry for each message
in the extreme case.

3. To further guard against the unnecessary frequent insertion of time
index entry, we used the index.interval.bytes as a restriction for time
index entry as well. Such that even for a newly created topic with the
default time.index.interval.ms we don't need to worry about overly
aggressive time index entry insertion.

Considering the above. The overall memory cost for time index should be
much smaller compared with the offset index. However, as you pointed out
for (1) might still be an issue. I am actually not sure about why we always
allocate 10 MB for the index file. This itself looks a problem given we
actually have a pretty good way to know the upper bound of memory taken by
an offset index.

Theoretically, the offset index file will at most have (log.segment.bytes /
index.interval.bytes) entries. In our default configuration,
log.segment.size=1GB, and index.interval.bytes=4K. This means we only need
(1GB/4K)*8 Bytes = 2MB. Allocating 10 MB is really a big waste of memory.

I suggest we do the following:
1. When creating the log index file, we always allocate memory using the
above calculation.
2. If the memory calculated in (1) is greater than segment.index.bytes, we
use segment.index.bytes instead. Otherwise we simply use the result in (1)

If we do this I believe the memory for index file will probably be smaller
even if we have the time index added. I will create a separate ticket for
the index file initial size.

Thanks,

Jiangjie (Becket) Qin

On Thu, Feb 25, 2016 at 3:30 PM, Guozhang Wang <wangg...@gmail.com> wrote:

> Jiangjie,
>
> I was originally only thinking about the "time.index.size.max.bytes" config
> in addition to the "offset.index.size.max.bytes". Since the latter's
> default size is 10MB, and for memory mapped file, we will allocate that
> much of memory at the start which could be a pressure on RAM if we double
> it.
>
> Guozhang
>
> On Wed, Feb 24, 2016 at 4:56 PM, Becket Qin <becket....@gmail.com> wrote:
>
> > Hi Guozhang,
> >
> > I thought about this again and it seems we stilll need the
> > time.index.interval.ms configuration to avoid unnecessary frequent time
> > index insertion.
> >
> > I just updated the wiki to add index.interval.bytes as an additional
> > constraints for time index entry insertion. Another slight change made
> was
> > that as long as a message timestamp shows time.index.interval.ms has
> > passed
> > since the timestamp of last time index entry, we will insert another
> > timestmap index entry. Previously we always insert time index at
> > time.index.interval.ms bucket boundaries.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Wed, Feb 24, 2016 at 2:40 PM, Becket Qin <becket....@gmail.com>
> wrote:
> >
> > > Thanks for the comment Guozhang,
> > >
> > > I just changed the configuration name to "time.index.interval.ms".
> > >
> > > It seems the real question here is how big the offset indices will be.
> > > Theoretically we can have one time index entry for each message in a
> log
> > > segment. For example, if there is one event per minute appended, we
> might
> > > have to have a time index entry for each message until the segment size
> > is
> > > reached. In that case, the number of index entries in the time index
> > would
> > > be (segment size / avg message size). So the time index file size can
> > > potentially be big.
> > >
> > > I am wondering if we can simply reuse the "index.interval.bytes"
> > > configuration instead of having a separate time index interval ms. i.e.
> > > instead of inserting a new entry based on time interval, we still
> insert
> > it
> > > based on bytes interval. This does not affect the granularity because
> we
> > > can search from the nearest index entry to find the message with
> correct
> > > timestamp. The good thing is that this guarantees there will not be
> huge
> > > time indices. We also save the new configuration.
> > >
> > > What do you think?
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Wed, Feb 24, 2016 at 1:00 PM, Guozhang Wang <wangg...@gmail.com>
> > wrote:
> > >
> > >> Thanks Jiangjie, a few comments on the wiki:
> > >>
> > >> 1. Config name "time.index.interval" to "time.index.interval.ms" to
> be
> > >> consistent. Also do we need a "time.index.size.max.bytes" as well?
> > >>
> > >> 2. Will the memory mapped index file for timestamp have the same
> default
> > >> initial / max size (10485760) as the offset index?
> > >>
> > >> Otherwise LGTM.
> > >>
> > >> Guozhang
> > >>
> > >> On Tue, Feb 23, 2016 at 5:05 PM, Becket Qin <becket....@gmail.com>
> > wrote:
> > >>
> > >> > Bump.
> > >> >
> > >> > Per Jun's comments during KIP hangout, I have updated wiki with the
> > >> upgrade
> > >> > plan or KIP-33.
> > >> >
> > >> > Let's vote!
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Jiangjie (Becket) Qin
> > >> >
> > >> > On Wed, Feb 3, 2016 at 10:32 AM, Becket Qin <becket....@gmail.com>
> > >> wrote:
> > >> >
> > >> > > Hi all,
> > >> > >
> > >> > > I would like to initiate the vote for KIP-33.
> > >> > >
> > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-33
> > >> > > +-+Add+a+time+based+log+index
> > >> > >
> > >> > > A good amount of the KIP has been touched during the discussion on
> > >> > KIP-32.
> > >> > > So I also put the link to KIP-32 here for reference.
> > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP
> > >> > > -32+-+Add+timestamps+to+Kafka+message
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > Jiangjie (Becket) Qin
> > >> > >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> -- Guozhang
> > >>
> > >
> > >
> >
>
>
>
> --
> -- Guozhang
>

Reply via email to