Re: [DISCUSS] KIP-852 Optimize calculation of size for log in remote tier

Luke Chen Wed, 09 Nov 2022 21:16:42 -0800

Hi Divij,

One more question about the metric:
I think the metric will be updated when
(1) each time we run the log retention check (that is,
log.retention.check.interval.ms)
(2) When user explicitly call getRemoteLogSize


Is that correct?
Maybe we should add a note in metric description, otherwise, when user got,
let's say 0 of RemoteLogSizeBytes, will be surprised.

Otherwise, LGTM

Thank you for the KIP
Luke

On Thu, Nov 10, 2022 at 2:55 AM Jun Rao <j...@confluent.io.invalid> wrote:

> Hi, Divij,
>
> Thanks for the explanation.
>
> 1. Hmm, the default implementation of RLMM does local caching, right?
> Currently, we also cache all segment metadata in the brokers without
> KIP-405. Do you see a need to change that?
>
> 2,3,4: Yes, your explanation makes sense. However,
> currently, RemoteLogMetadataManager.listRemoteLogSegments() doesn't specify
> a particular order of the iterator. Do you intend to change that?
>
> Thanks,
>
> Jun
>
> On Tue, Nov 8, 2022 at 3:31 AM Divij Vaidya <divijvaidy...@gmail.com>
> wrote:
>
> > Hey Jun
> >
> > Thank you for your comments.
> >
> > *1. "RLMM implementor could ensure that listRemoteLogSegments() is fast"*
> > This would be ideal but pragmatically, it is difficult to ensure that
> > listRemoteLogSegments() is fast. This is because of the possibility of a
> > large number of segments (much larger than what Kafka currently handles
> > with local storage today) would make it infeasible to adopt strategies
> such
> > as local caching to improve the performance of listRemoteLogSegments.
> Apart
> > from caching (which won't work due to size limitations) I can't think of
> > other strategies which may eliminate the need for IO
> > operations proportional to the number of total segments. Please advise if
> > you have something in mind.
> >
> > 2.  "*If the size exceeds the retention size, we need to determine the
> > subset of segments to delete to bring the size within the retention
> limit.
> > Do we need to call RemoteLogMetadataManager.listRemoteLogSegments() to
> > determine that?"*
> > Yes, we need to call listRemoteLogSegments() to determine which segments
> > should be deleted. But there is a difference with the use case we are
> > trying to optimize with this KIP. To determine the subset of segments
> which
> > would be deleted, we only read metadata for segments which would be
> deleted
> > via the listRemoteLogSegments(). But to determine the totalLogSize, which
> > is required every time retention logic based on size executes, we read
> > metadata of *all* the segments in remote storage. Hence, the number of
> > results returned by *RemoteLogMetadataManager.listRemoteLogSegments() *is
> > different when we are calculating totalLogSize vs. when we are
> determining
> > the subset of segments to delete.
> >
> > 3.
> > *"Also, what about time-based retention? To make that efficient, do we
> need
> > to make some additional interface changes?"*No. Note that time complexity
> > to determine the segments for retention is different for time based vs.
> > size based. For time based, the time complexity is a function of the
> number
> > of segments which are "eligible for deletion" (since we only read
> metadata
> > for segments which would be deleted) whereas in size based retention, the
> > time complexity is a function of "all segments" available in remote
> storage
> > (metadata of all segments needs to be read to calculate the total size).
> As
> > you may observe, this KIP will bring the time complexity for both time
> > based retention & size based retention to the same function.
> >
> > 4. Also, please note that this new API introduced in this KIP also
> enables
> > us to provide a metric for total size of data stored in remote storage.
> > Without the API, calculation of this metric will become very expensive
> with
> > *listRemoteLogSegments().*
> > I understand that your motivation here is to avoid polluting the
> interface
> > with optimization specific APIs and I will agree with that goal. But I
> > believe that this new API proposed in the KIP brings in significant
> > improvement and there is no other work around available to achieve the
> same
> > performance.
> >
> > Regards,
> > Divij Vaidya
> >
> >
> >
> > On Tue, Nov 8, 2022 at 12:12 AM Jun Rao <j...@confluent.io.invalid>
> wrote:
> >
> > > Hi, Divij,
> > >
> > > Thanks for the KIP. Sorry for the late reply.
> > >
> > > The motivation of the KIP is to improve the efficiency of size based
> > > retention. I am not sure the proposed changes are enough. For example,
> if
> > > the size exceeds the retention size, we need to determine the subset of
> > > segments to delete to bring the size within the retention limit. Do we
> > need
> > > to call RemoteLogMetadataManager.listRemoteLogSegments() to determine
> > that?
> > > Also, what about time-based retention? To make that efficient, do we
> need
> > > to make some additional interface changes?
> > >
> > > An alternative approach is for the RLMM implementor to make sure
> > > that RemoteLogMetadataManager.listRemoteLogSegments() is fast (e.g.,
> with
> > > local caching). This way, we could keep the interface simple. Have we
> > > considered that?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Wed, Sep 28, 2022 at 6:28 AM Divij Vaidya <divijvaidy...@gmail.com>
> > > wrote:
> > >
> > > > Hey folks
> > > >
> > > > Does anyone else have any thoughts on this before I propose this for
> a
> > > > vote?
> > > >
> > > > --
> > > > Divij Vaidya
> > > >
> > > >
> > > >
> > > > On Mon, Sep 5, 2022 at 12:57 PM Satish Duggana <
> > satish.dugg...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Thanks for the KIP Divij!
> > > > >
> > > > > This is a nice improvement to avoid recalculation of size.
> Customized
> > > > RLMMs
> > > > > can implement the best possible approach by caching or maintaining
> > the
> > > > size
> > > > > in an efficient way. But this is not a big concern for the default
> > > topic
> > > > > based RLMM as mentioned in the KIP.
> > > > >
> > > > > ~Satish.
> > > > >
> > > > > On Wed, 13 Jul 2022 at 18:48, Divij Vaidya <
> divijvaidy...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thank you for your review Luke.
> > > > > >
> > > > > > > Reg: is that would the new `RemoteLogSizeBytes` metric be a
> > > > performance
> > > > > > overhead? Although we move the calculation to a seperate API, we
> > > still
> > > > > > can't assume users will implement a light-weight method, right?
> > > > > >
> > > > > > This metric would be logged using the information that is already
> > > being
> > > > > > calculated for handling remote retention logic, hence, no
> > additional
> > > > work
> > > > > > is required to calculate this metric. More specifically, whenever
> > > > > > RemoteLogManager calls getRemoteLogSize API, this metric would be
> > > > > captured.
> > > > > > This API call is made every time RemoteLogManager wants to handle
> > > > expired
> > > > > > remote log segments (which should be periodic). Does that address
> > > your
> > > > > > concern?
> > > > > >
> > > > > > Divij Vaidya
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Jul 12, 2022 at 11:01 AM Luke Chen <show...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Hi Divij,
> > > > > > >
> > > > > > > Thanks for the KIP!
> > > > > > >
> > > > > > > I think it makes sense to delegate the responsibility of
> > > calculation
> > > > to
> > > > > > the
> > > > > > > specific RemoteLogMetadataManager implementation.
> > > > > > > But one thing I'm not quite sure, is that would the new
> > > > > > > `RemoteLogSizeBytes` metric be a performance overhead?
> > > > > > > Although we move the calculation to a seperate API, we still
> > can't
> > > > > assume
> > > > > > > users will implement a light-weight method, right?
> > > > > > >
> > > > > > > Thank you.
> > > > > > > Luke
> > > > > > >
> > > > > > > On Fri, Jul 1, 2022 at 5:47 PM Divij Vaidya <
> > > divijvaidy...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-852%3A+Optimize+calculation+of+size+for+log+in+remote+tier
> > > > > > > >
> > > > > > > >
> > > > > > > > Hey folks
> > > > > > > >
> > > > > > > > Please take a look at this KIP which proposes an extension to
> > > > > KIP-405.
> > > > > > > This
> > > > > > > > is my first KIP with Apache Kafka community so any feedback
> > would
> > > > be
> > > > > > > highly
> > > > > > > > appreciated.
> > > > > > > >
> > > > > > > > Cheers!
> > > > > > > >
> > > > > > > > --
> > > > > > > > Divij Vaidya
> > > > > > > > Sr. Software Engineer
> > > > > > > > Amazon
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-852 Optimize calculation of size for log in remote tier

Reply via email to