Hi, Divij,

Thanks for the KIP. Sorry for the late reply.

The motivation of the KIP is to improve the efficiency of size based
retention. I am not sure the proposed changes are enough. For example, if
the size exceeds the retention size, we need to determine the subset of
segments to delete to bring the size within the retention limit. Do we need
to call RemoteLogMetadataManager.listRemoteLogSegments() to determine that?
Also, what about time-based retention? To make that efficient, do we need
to make some additional interface changes?

An alternative approach is for the RLMM implementor to make sure
that RemoteLogMetadataManager.listRemoteLogSegments() is fast (e.g., with
local caching). This way, we could keep the interface simple. Have we
considered that?

Thanks,

Jun

On Wed, Sep 28, 2022 at 6:28 AM Divij Vaidya <divijvaidy...@gmail.com>
wrote:

> Hey folks
>
> Does anyone else have any thoughts on this before I propose this for a
> vote?
>
> --
> Divij Vaidya
>
>
>
> On Mon, Sep 5, 2022 at 12:57 PM Satish Duggana <satish.dugg...@gmail.com>
> wrote:
>
> > Thanks for the KIP Divij!
> >
> > This is a nice improvement to avoid recalculation of size. Customized
> RLMMs
> > can implement the best possible approach by caching or maintaining the
> size
> > in an efficient way. But this is not a big concern for the default topic
> > based RLMM as mentioned in the KIP.
> >
> > ~Satish.
> >
> > On Wed, 13 Jul 2022 at 18:48, Divij Vaidya <divijvaidy...@gmail.com>
> > wrote:
> >
> > > Thank you for your review Luke.
> > >
> > > > Reg: is that would the new `RemoteLogSizeBytes` metric be a
> performance
> > > overhead? Although we move the calculation to a seperate API, we still
> > > can't assume users will implement a light-weight method, right?
> > >
> > > This metric would be logged using the information that is already being
> > > calculated for handling remote retention logic, hence, no additional
> work
> > > is required to calculate this metric. More specifically, whenever
> > > RemoteLogManager calls getRemoteLogSize API, this metric would be
> > captured.
> > > This API call is made every time RemoteLogManager wants to handle
> expired
> > > remote log segments (which should be periodic). Does that address your
> > > concern?
> > >
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Tue, Jul 12, 2022 at 11:01 AM Luke Chen <show...@gmail.com> wrote:
> > >
> > > > Hi Divij,
> > > >
> > > > Thanks for the KIP!
> > > >
> > > > I think it makes sense to delegate the responsibility of calculation
> to
> > > the
> > > > specific RemoteLogMetadataManager implementation.
> > > > But one thing I'm not quite sure, is that would the new
> > > > `RemoteLogSizeBytes` metric be a performance overhead?
> > > > Although we move the calculation to a seperate API, we still can't
> > assume
> > > > users will implement a light-weight method, right?
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > > On Fri, Jul 1, 2022 at 5:47 PM Divij Vaidya <divijvaidy...@gmail.com
> >
> > > > wrote:
> > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-852%3A+Optimize+calculation+of+size+for+log+in+remote+tier
> > > > >
> > > > >
> > > > > Hey folks
> > > > >
> > > > > Please take a look at this KIP which proposes an extension to
> > KIP-405.
> > > > This
> > > > > is my first KIP with Apache Kafka community so any feedback would
> be
> > > > highly
> > > > > appreciated.
> > > > >
> > > > > Cheers!
> > > > >
> > > > > --
> > > > > Divij Vaidya
> > > > > Sr. Software Engineer
> > > > > Amazon
> > > > >
> > > >
> > >
> >
>

Reply via email to