Re: [DISCUSS] KIP-852 Optimize calculation of size for log in remote tier

Satish Duggana Fri, 30 Jun 2023 18:58:54 -0700

Hi Divij,
I am fine with having an API to compute the size as I mentioned in my
earlier reply in this mail thread. But I have the below comment for
the motivation for this KIP.


As you discussed offline, the main issue here is listing calls for
remote log segment metadata is slower because of the storage used for
RLMM. These can be avoided with this new API.

Please add this in the motivation section as it is one of the main
motivations for the KIP.

Thanks,
Satish.

On Sat, 1 Jul 2023 at 01:43, Jun Rao <[email protected]> wrote:
>
> Hi, Divij,
>
> Sorry for the late reply.
>
> Given your explanation, the new API sounds reasonable to me. Is that enough
> to build the external metadata layer for the remote segments or do you need
> some additional API changes?
>
> Thanks,
>
> Jun
>
> On Fri, Jun 9, 2023 at 7:08 AM Divij Vaidya <[email protected]> wrote:
>
> > Thank you for looking into this Kamal.
> >
> > You are right in saying that a cold start (i.e. leadership failover or
> > broker startup) does not impact the broker startup duration. But it does
> > have the following impact:
> > 1. It leads to a burst of full-scan requests to RLMM in case multiple
> > leadership failovers occur at the same time. Even if the RLMM
> > implementation has the capability to serve the total size from an index
> > (and hence handle this burst), we wouldn't be able to use it since the
> > current API necessarily calls for a full scan.
> > 2. The archival (copying of data to tiered storage) process will have a
> > delayed start. The delayed start of archival could lead to local build up
> > of data which may lead to disk full.
> >
> > The disadvantage of adding this new API is that every provider will have to
> > implement it, agreed. But I believe that this tradeoff is worthwhile since
> > the default implementation could be the same as you mentioned, i.e. keeping
> > cumulative in-memory count.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Sun, Jun 4, 2023 at 5:48 PM Kamal Chandraprakash <
> > [email protected]> wrote:
> >
> > > Hi Divij,
> > >
> > > Thanks for the KIP! Sorry for the late reply.
> > >
> > > Can you explain the rejected alternative-3?
> > > Store the cumulative size of remote tier log in-memory at
> > RemoteLogManager
> > > "*Cons*: Every time a broker starts-up, it will scan through all the
> > > segments in the remote tier to initialise the in-memory value. This would
> > > increase the broker start-up time."
> > >
> > > Keeping the source of truth to determine the remote-log-size in the
> > leader
> > > would be consistent across different implementations of the plugin. The
> > > concern posted in the KIP is that we are calculating the remote-log-size
> > on
> > > each iteration of the cleaner thread (say 5 mins). If we calculate only
> > > once during broker startup or during the leadership reassignment, do we
> > > still need the cache?
> > >
> > > The broker startup-time won't be affected by the remote log manager
> > > initialisation. The broker continue to start accepting the new
> > > produce/fetch requests, while the RLM thread in the background can
> > > determine the remote-log-size once and start copying/deleting the
> > segments.
> > >
> > > Thanks,
> > > Kamal
> > >
> > > On Thu, Jun 1, 2023 at 2:08 PM Divij Vaidya <[email protected]>
> > > wrote:
> > >
> > > > Satish / Jun
> > > >
> > > > Do you have any thoughts on this?
> > > >
> > > > --
> > > > Divij Vaidya
> > > >
> > > >
> > > >
> > > > On Tue, Feb 14, 2023 at 4:15 PM Divij Vaidya <[email protected]>
> > > > wrote:
> > > >
> > > > > Hey Jun
> > > > >
> > > > > It has been a while since this KIP got some attention. While we wait
> > > for
> > > > > Satish to chime in here, perhaps I can answer your question.
> > > > >
> > > > > > Could you explain how you exposed the log size in your KIP-405
> > > > > implementation?
> > > > >
> > > > > The APIs available in RLMM as per KIP405
> > > > > are, addRemoteLogSegmentMetadata(), updateRemoteLogSegmentMetadata(),
> > > > remoteLogSegmentMetadata(), highestOffsetForEpoch(),
> > > > putRemotePartitionDeleteMetadata(), listRemoteLogSegments(),
> > > > onPartitionLeadershipChanges()
> > > > > and onStopPartitions(). None of these APIs allow us to expose the log
> > > > size,
> > > > > hence, the only option that remains is to list all segments using
> > > > > listRemoteLogSegments() and aggregate them every time we require to
> > > > > calculate the size. Based on our prior discussion, this requires
> > > reading
> > > > > all segment metadata which won't work for non-local RLMM
> > > implementations.
> > > > > Satish's implementation also performs a full scan and calculates the
> > > > > aggregate. see:
> > > > >
> > > >
> > >
> > https://github.com/satishd/kafka/blob/2.8.x-tiered-storage/core/src/main/scala/kafka/log/remote/RemoteLogManager.scala#L619
> > > > >
> > > > >
> > > > > Does this answer your question?
> > > > >
> > > > > --
> > > > > Divij Vaidya
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Dec 20, 2022 at 8:40 PM Jun Rao <[email protected]>
> > > > wrote:
> > > > >
> > > > >> Hi, Divij,
> > > > >>
> > > > >> Thanks for the explanation.
> > > > >>
> > > > >> Good question.
> > > > >>
> > > > >> Hi, Satish,
> > > > >>
> > > > >> Could you explain how you exposed the log size in your KIP-405
> > > > >> implementation?
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> Jun
> > > > >>
> > > > >> On Tue, Dec 20, 2022 at 4:59 AM Divij Vaidya <
> > [email protected]
> > > >
> > > > >> wrote:
> > > > >>
> > > > >> > Hey Jun
> > > > >> >
> > > > >> > Yes, it is possible to maintain the log size in the cache (see
> > > > rejected
> > > > >> > alternative#3 in the KIP) but I did not understand how it is
> > > possible
> > > > to
> > > > >> > retrieve it without the new API. The log size could be calculated
> > on
> > > > >> > startup by scanning through the segments (though I would disagree
> > > that
> > > > >> this
> > > > >> > is the right approach since scanning itself takes order of minutes
> > > and
> > > > >> > hence delay the start of archive process), and incrementally
> > > > maintained
> > > > >> > afterwards, even then, we would need an API in
> > > > RemoteLogMetadataManager
> > > > >> so
> > > > >> > that RLM could fetch the cached size!
> > > > >> >
> > > > >> > If we wish to cache the size without adding a new API, then we
> > need
> > > to
> > > > >> > cache the size in RLM itself (instead of RLMM implementation) and
> > > > >> > incrementally manage it. The downside of longer archive time at
> > > > startup
> > > > >> > (due to initial scale) still remains valid in this situation.
> > > > >> >
> > > > >> > --
> > > > >> > Divij Vaidya
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > On Fri, Dec 16, 2022 at 12:43 AM Jun Rao <[email protected]
> > >
> > > > >> wrote:
> > > > >> >
> > > > >> > > Hi, Divij,
> > > > >> > >
> > > > >> > > Thanks for the explanation.
> > > > >> > >
> > > > >> > > If there is in-memory cache, could we maintain the log size in
> > the
> > > > >> cache
> > > > >> > > with the existing API? For example, a replica could make a
> > > > >> > > listRemoteLogSegments(TopicIdPartition topicIdPartition) call on
> > > > >> startup
> > > > >> > to
> > > > >> > > get the remote segment size before the current leaderEpoch. The
> > > > leader
> > > > >> > > could then maintain the size incrementally afterwards. On leader
> > > > >> change,
> > > > >> > > other replicas can make a listRemoteLogSegments(TopicIdPartition
> > > > >> > > topicIdPartition, int leaderEpoch) call to get the size of newly
> > > > >> > generated
> > > > >> > > segments.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > >
> > > > >> > > Jun
> > > > >> > >
> > > > >> > >
> > > > >> > > On Wed, Dec 14, 2022 at 3:27 AM Divij Vaidya <
> > > > [email protected]
> > > > >> >
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > > Is the new method enough for doing size-based retention?
> > > > >> > > >
> > > > >> > > > Yes. You are right in assuming that this API only provides the
> > > > >> Remote
> > > > >> > > > storage size (for current epoch chain). We would use this API
> > > for
> > > > >> size
> > > > >> > > > based retention along with a value of localOnlyLogSegmentSize
> > > > which
> > > > >> is
> > > > >> > > > computed as Log.sizeInBytes(logSegments.filter(_.baseOffset >
> > > > >> > > > highestOffsetWithRemoteIndex)). Hence, (total_log_size =
> > > > >> > > > remoteLogSizeBytes + log.localOnlyLogSegmentSize). I have
> > > updated
> > > > >> the
> > > > >> > KIP
> > > > >> > > > with this information. You can also check an example
> > > > implementation
> > > > >> at
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> > https://github.com/satishd/kafka/blob/2.8.x-tiered-storage/core/src/main/scala/kafka/log/Log.scala#L2077
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > > Do you imagine all accesses to remote metadata will be
> > across
> > > > the
> > > > >> > > network
> > > > >> > > > or will there be some local in-memory cache?
> > > > >> > > >
> > > > >> > > > I would expect a disk-less implementation to maintain a finite
> > > > >> > in-memory
> > > > >> > > > cache for segment metadata to optimize the number of network
> > > calls
> > > > >> made
> > > > >> > > to
> > > > >> > > > fetch the data. In future, we can think about bringing this
> > > finite
> > > > >> size
> > > > >> > > > cache into RLM itself but that's probably a conversation for a
> > > > >> > different
> > > > >> > > > KIP. There are many other things we would like to do to
> > optimize
> > > > the
> > > > >> > > Tiered
> > > > >> > > > storage interface such as introducing a circular buffer /
> > > > streaming
> > > > >> > > > interface from RSM (so that we don't have to wait to fetch the
> > > > >> entire
> > > > >> > > > segment before starting to send records to the consumer),
> > > caching
> > > > >> the
> > > > >> > > > segments fetched from RSM locally (I would assume all RSM
> > plugin
> > > > >> > > > implementations to do this, might as well add it to RLM) etc.
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > > Divij Vaidya
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Mon, Dec 12, 2022 at 7:35 PM Jun Rao
> > > <[email protected]
> > > > >
> > > > >> > > wrote:
> > > > >> > > >
> > > > >> > > > > Hi, Divij,
> > > > >> > > > >
> > > > >> > > > > Thanks for the reply.
> > > > >> > > > >
> > > > >> > > > > Is the new method enough for doing size-based retention? It
> > > > gives
> > > > >> the
> > > > >> > > > total
> > > > >> > > > > size of the remote segments, but it seems that we still
> > don't
> > > > know
> > > > >> > the
> > > > >> > > > > exact total size for a log since there could be overlapping
> > > > >> segments
> > > > >> > > > > between the remote and the local segments.
> > > > >> > > > >
> > > > >> > > > > You mentioned a disk-less implementation. Do you imagine all
> > > > >> accesses
> > > > >> > > to
> > > > >> > > > > remote metadata will be across the network or will there be
> > > some
> > > > >> > local
> > > > >> > > > > in-memory cache?
> > > > >> > > > >
> > > > >> > > > > Thanks,
> > > > >> > > > >
> > > > >> > > > > Jun
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > On Wed, Dec 7, 2022 at 3:10 AM Divij Vaidya <
> > > > >> [email protected]
> > > > >> > >
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > The method is needed for RLMM implementations which fetch
> > > the
> > > > >> > > > information
> > > > >> > > > > > over the network and not for the disk based
> > implementations
> > > > >> (such
> > > > >> > as
> > > > >> > > > the
> > > > >> > > > > > default topic based RLMM).
> > > > >> > > > > >
> > > > >> > > > > > I would argue that adding this API makes the interface
> > more
> > > > >> generic
> > > > >> > > > than
> > > > >> > > > > > what it is today. This is because, with the current APIs
> > an
> > > > >> > > implementor
> > > > >> > > > > is
> > > > >> > > > > > restricted to use disk based RLMM solutions only (i.e. the
> > > > >> default
> > > > >> > > > > > solution) whereas if we add this new API, we unblock usage
> > > of
> > > > >> > network
> > > > >> > > > > based
> > > > >> > > > > > RLMM implementations such as databases.
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > > On Wed 30. Nov 2022 at 20:40, Jun Rao
> > > > <[email protected]
> > > > >> >
> > > > >> > > > wrote:
> > > > >> > > > > >
> > > > >> > > > > > > Hi, Divij,
> > > > >> > > > > > >
> > > > >> > > > > > > Thanks for the reply.
> > > > >> > > > > > >
> > > > >> > > > > > > Point#2. My high level question is that is the new
> > method
> > > > >> needed
> > > > >> > > for
> > > > >> > > > > > every
> > > > >> > > > > > > implementation of remote storage or just for a specific
> > > > >> > > > implementation.
> > > > >> > > > > > The
> > > > >> > > > > > > issues that you pointed out exist for the default
> > > > >> implementation
> > > > >> > of
> > > > >> > > > > RLMM
> > > > >> > > > > > as
> > > > >> > > > > > > well and so far, the default implementation hasn't
> > found a
> > > > >> need
> > > > >> > > for a
> > > > >> > > > > > > similar new method. For public interface, ideally we
> > want
> > > to
> > > > >> make
> > > > >> > > it
> > > > >> > > > > more
> > > > >> > > > > > > general.
> > > > >> > > > > > >
> > > > >> > > > > > > Thanks,
> > > > >> > > > > > >
> > > > >> > > > > > > Jun
> > > > >> > > > > > >
> > > > >> > > > > > > On Mon, Nov 21, 2022 at 7:11 AM Divij Vaidya <
> > > > >> > > > [email protected]>
> > > > >> > > > > > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > > Thank you Jun and Alex for your comments.
> > > > >> > > > > > > >
> > > > >> > > > > > > > Point#1: You are right Jun. As Alex mentioned, the
> > > > "derived
> > > > >> > > > metadata"
> > > > >> > > > > > can
> > > > >> > > > > > > > increase the size of cached metadata by a factor of 10
> > > but
> > > > >> it
> > > > >> > > > should
> > > > >> > > > > be
> > > > >> > > > > > > ok
> > > > >> > > > > > > > to cache just the actual metadata. My point about size
> > > > >> being a
> > > > >> > > > > > limitation
> > > > >> > > > > > > > for using cache is not valid anymore.
> > > > >> > > > > > > >
> > > > >> > > > > > > > Point#2: For a new replica, it would still have to
> > fetch
> > > > the
> > > > >> > > > metadata
> > > > >> > > > > > > over
> > > > >> > > > > > > > the network to initiate the warm up of the cache and
> > > > hence,
> > > > >> > > > increase
> > > > >> > > > > > the
> > > > >> > > > > > > > start time of the archival process. Please also note
> > the
> > > > >> > > > > repercussions
> > > > >> > > > > > of
> > > > >> > > > > > > > the warm up scan that Alex mentioned in this thread as
> > > > part
> > > > >> of
> > > > >> > > > > #102.2.
> > > > >> > > > > > > >
> > > > >> > > > > > > > 100#: Agreed Alex. Thanks for clarifying that. My
> > point
> > > > >> about
> > > > >> > > size
> > > > >> > > > > > being
> > > > >> > > > > > > a
> > > > >> > > > > > > > limitation for using cache is not valid anymore.
> > > > >> > > > > > > >
> > > > >> > > > > > > > 101#: Alex, if I understand correctly, you are
> > > suggesting
> > > > to
> > > > >> > > cache
> > > > >> > > > > the
> > > > >> > > > > > > > total size at the leader and update it on archival.
> > This
> > > > >> > wouldn't
> > > > >> > > > > work
> > > > >> > > > > > > for
> > > > >> > > > > > > > cases when the leader restarts where we would have to
> > > > make a
> > > > >> > full
> > > > >> > > > > scan
> > > > >> > > > > > > > to update the total size entry on startup. We expect
> > > users
> > > > >> to
> > > > >> > > store
> > > > >> > > > > > data
> > > > >> > > > > > > > over longer duration in remote storage which increases
> > > the
> > > > >> > > > likelihood
> > > > >> > > > > > of
> > > > >> > > > > > > > leader restarts / failovers.
> > > > >> > > > > > > >
> > > > >> > > > > > > > 102#.1: I don't think that the current design
> > > accommodates
> > > > >> the
> > > > >> > > fact
> > > > >> > > > > > that
> > > > >> > > > > > > > data corruption could happen at the RLMM plugin (we
> > > don't
> > > > >> have
> > > > >> > > > > checksum
> > > > >> > > > > > > as
> > > > >> > > > > > > > a field in metadata as part of KIP405). If data
> > > corruption
> > > > >> > > occurs,
> > > > >> > > > w/
> > > > >> > > > > > or
> > > > >> > > > > > > > w/o the cache, it would be a different problem to
> > > solve. I
> > > > >> > would
> > > > >> > > > like
> > > > >> > > > > > to
> > > > >> > > > > > > > keep this outside the scope of this KIP.
> > > > >> > > > > > > >
> > > > >> > > > > > > > 102#.2: Agree. This remains as the main concern for
> > > using
> > > > >> the
> > > > >> > > cache
> > > > >> > > > > to
> > > > >> > > > > > > > fetch total size.
> > > > >> > > > > > > >
> > > > >> > > > > > > > Regards,
> > > > >> > > > > > > > Divij Vaidya
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > On Fri, Nov 18, 2022 at 12:59 PM Alexandre Dupriez <
> > > > >> > > > > > > > [email protected]> wrote:
> > > > >> > > > > > > >
> > > > >> > > > > > > > > Hi Divij,
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Thanks for the KIP. Please find some comments based
> > on
> > > > >> what I
> > > > >> > > > read
> > > > >> > > > > on
> > > > >> > > > > > > > > this thread so far - apologies for the repeats and
> > the
> > > > >> late
> > > > >> > > > reply.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > If I understand correctly, one of the main elements
> > of
> > > > >> > > discussion
> > > > >> > > > > is
> > > > >> > > > > > > > > about caching in Kafka versus delegation of
> > providing
> > > > the
> > > > >> > > remote
> > > > >> > > > > size
> > > > >> > > > > > > > > of a topic-partition to the plugin.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > A few comments:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > 100. The size of the “derived metadata” which is
> > > managed
> > > > >> by
> > > > >> > the
> > > > >> > > > > > plugin
> > > > >> > > > > > > > > to represent an rlmMetadata can indeed be close to 1
> > > kB
> > > > on
> > > > >> > > > average
> > > > >> > > > > > > > > depending on its own internal structure, e.g. the
> > > > >> redundancy
> > > > >> > it
> > > > >> > > > > > > > > enforces (unfortunately resulting to duplication),
> > > > >> additional
> > > > >> > > > > > > > > information such as checksums and primary and
> > > secondary
> > > > >> > > indexable
> > > > >> > > > > > > > > keys. But indeed, the rlmMetadata is itself a
> > lighter
> > > > data
> > > > >> > > > > structure
> > > > >> > > > > > > > > by a factor of 10. And indeed, instead of caching
> > the
> > > > >> > “derived
> > > > >> > > > > > > > > metadata”, only the rlmMetadata could be, which
> > should
> > > > >> > address
> > > > >> > > > the
> > > > >> > > > > > > > > concern regarding the memory occupancy of the cache.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > 101. I am not sure I fully understand why we would
> > > need
> > > > to
> > > > >> > > cache
> > > > >> > > > > the
> > > > >> > > > > > > > > list of rlmMetadata to retain the remote size of a
> > > > >> > > > topic-partition.
> > > > >> > > > > > > > > Since the leader of a topic-partition is, in
> > > > >> non-degenerated
> > > > >> > > > cases,
> > > > >> > > > > > > > > the only actor which can mutate the remote part of
> > the
> > > > >> > > > > > > > > topic-partition, hence its size, it could in theory
> > > only
> > > > >> > cache
> > > > >> > > > the
> > > > >> > > > > > > > > size of the remote log once it has calculated it? In
> > > > which
> > > > >> > case
> > > > >> > > > > there
> > > > >> > > > > > > > > would not be any problem regarding the size of the
> > > > caching
> > > > >> > > > > strategy.
> > > > >> > > > > > > > > Did I miss something there?
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > 102. There may be a few challenges to consider with
> > > > >> caching:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > 102.1) As mentioned above, the caching strategy
> > > assumes
> > > > no
> > > > >> > > > mutation
> > > > >> > > > > > > > > outside the lifetime of a leader. While this is true
> > > in
> > > > >> the
> > > > >> > > > normal
> > > > >> > > > > > > > > course of operation, there could be accidental
> > > mutation
> > > > >> > outside
> > > > >> > > > of
> > > > >> > > > > > the
> > > > >> > > > > > > > > leader and a loss of consistency between the cached
> > > > state
> > > > >> and
> > > > >> > > the
> > > > >> > > > > > > > > actual remote representation of the log. E.g.
> > > > split-brain
> > > > >> > > > > scenarios,
> > > > >> > > > > > > > > bugs in the plugins, bugs in external systems with
> > > > >> mutating
> > > > >> > > > access
> > > > >> > > > > on
> > > > >> > > > > > > > > the derived metadata. In the worst case, a drift
> > > between
> > > > >> the
> > > > >> > > > cached
> > > > >> > > > > > > > > size and the actual size could lead to over-deleting
> > > > >> remote
> > > > >> > > data
> > > > >> > > > > > which
> > > > >> > > > > > > > > is a durability risk.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > The alternative you propose, by making the plugin
> > the
> > > > >> source
> > > > >> > of
> > > > >> > > > > truth
> > > > >> > > > > > > > > w.r.t. to the size of the remote log, can make it
> > > easier
> > > > >> to
> > > > >> > > avoid
> > > > >> > > > > > > > > inconsistencies between plugin-managed metadata and
> > > the
> > > > >> > remote
> > > > >> > > > log
> > > > >> > > > > > > > > from the perspective of Kafka. On the other hand,
> > > plugin
> > > > >> > > vendors
> > > > >> > > > > > would
> > > > >> > > > > > > > > have to implement it with the expected efficiency to
> > > > have
> > > > >> it
> > > > >> > > > yield
> > > > >> > > > > > > > > benefits.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > 102.2) As you mentioned, the caching strategy in
> > Kafka
> > > > >> would
> > > > >> > > > still
> > > > >> > > > > > > > > require one iteration over the list of rlmMetadata
> > > when
> > > > >> the
> > > > >> > > > > > leadership
> > > > >> > > > > > > > > of a topic-partition is assigned to a broker, while
> > > the
> > > > >> > plugin
> > > > >> > > > can
> > > > >> > > > > > > > > offer alternative constant-time approaches. This
> > > > >> calculation
> > > > >> > > > cannot
> > > > >> > > > > > be
> > > > >> > > > > > > > > put on the LeaderAndIsr path and would be performed
> > in
> > > > the
> > > > >> > > > > > background.
> > > > >> > > > > > > > > In case of bulk leadership migration, listing the
> > > > >> rlmMetadata
> > > > >> > > > could
> > > > >> > > > > > a)
> > > > >> > > > > > > > > result in request bursts to any backend system the
> > > > plugin
> > > > >> may
> > > > >> > > use
> > > > >> > > > > > > > > [which shouldn’t be a problem for high-throughput
> > data
> > > > >> stores
> > > > >> > > but
> > > > >> > > > > > > > > could have cost implications] b) increase
> > utilisation
> > > > >> > timespan
> > > > >> > > of
> > > > >> > > > > the
> > > > >> > > > > > > > > RLM threads for these calculations potentially
> > leading
> > > > to
> > > > >> > > > transient
> > > > >> > > > > > > > > starvation of tasks queued for, typically,
> > offloading
> > > > >> > > operations
> > > > >> > > > c)
> > > > >> > > > > > > > > could have a non-marginal CPU footprint on hardware
> > > with
> > > > >> > strict
> > > > >> > > > > > > > > resource constraints. All these elements could have
> > an
> > > > >> impact
> > > > >> > > to
> > > > >> > > > > some
> > > > >> > > > > > > > > degree depending on the operational environment.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > From a design perspective, one question is where we
> > > want
> > > > >> the
> > > > >> > > > source
> > > > >> > > > > > of
> > > > >> > > > > > > > > truth w.r.t. remote log size to be during the
> > lifetime
> > > > of
> > > > >> a
> > > > >> > > > leader.
> > > > >> > > > > > > > > The responsibility of maintaining a consistent
> > > > >> representation
> > > > >> > > of
> > > > >> > > > > the
> > > > >> > > > > > > > > remote log is shared by Kafka and the plugin. Which
> > > > >> system is
> > > > >> > > > best
> > > > >> > > > > > > > > placed to maintain such a state while providing the
> > > > >> highest
> > > > >> > > > > > > > > consistency guarantees is something both Kafka and
> > > > plugin
> > > > >> > > > designers
> > > > >> > > > > > > > > could help understand better.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Many thanks,
> > > > >> > > > > > > > > Alexandre
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Le jeu. 17 nov. 2022 à 19:27, Jun Rao
> > > > >> > <[email protected]
> > > > >> > > >
> > > > >> > > > a
> > > > >> > > > > > > > écrit :
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Hi, Divij,
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Thanks for the reply.
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Point #1. Is the average remote segment metadata
> > > > really
> > > > >> > 1KB?
> > > > >> > > > > What's
> > > > >> > > > > > > > > listed
> > > > >> > > > > > > > > > in the public interface is probably well below 100
> > > > >> bytes.
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Point #2. I guess you are assuming that each
> > broker
> > > > only
> > > > >> > > caches
> > > > >> > > > > the
> > > > >> > > > > > > > > remote
> > > > >> > > > > > > > > > segment metadata in memory. An alternative
> > approach
> > > is
> > > > >> to
> > > > >> > > cache
> > > > >> > > > > > them
> > > > >> > > > > > > in
> > > > >> > > > > > > > > > both memory and local disk. That way, on broker
> > > > restart,
> > > > >> > you
> > > > >> > > > just
> > > > >> > > > > > > need
> > > > >> > > > > > > > to
> > > > >> > > > > > > > > > fetch the new remote segments' metadata using the
> > > > >> > > > > > > > > > listRemoteLogSegments(TopicIdPartition
> > > > topicIdPartition,
> > > > >> > int
> > > > >> > > > > > > > leaderEpoch)
> > > > >> > > > > > > > > > api. Will that work?
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Point #3. Thanks for the explanation and it sounds
> > > > good.
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Thanks,
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Jun
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > On Thu, Nov 17, 2022 at 7:31 AM Divij Vaidya <
> > > > >> > > > > > > [email protected]>
> > > > >> > > > > > > > > > wrote:
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > > Hi Jun
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > There are three points that I would like to
> > > present
> > > > >> here:
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > 1. We would require a large cache size to
> > > > efficiently
> > > > >> > cache
> > > > >> > > > all
> > > > >> > > > > > > > segment
> > > > >> > > > > > > > > > > metadata.
> > > > >> > > > > > > > > > > 2. Linear scan of all metadata at broker startup
> > > to
> > > > >> > > populate
> > > > >> > > > > the
> > > > >> > > > > > > > cache
> > > > >> > > > > > > > > will
> > > > >> > > > > > > > > > > be slow and will impact the archival process.
> > > > >> > > > > > > > > > > 3. There is no other use case where a full scan
> > of
> > > > >> > segment
> > > > >> > > > > > metadata
> > > > >> > > > > > > > is
> > > > >> > > > > > > > > > > required.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Let's start by quantifying 1. Here's my estimate
> > > for
> > > > >> the
> > > > >> > > size
> > > > >> > > > > of
> > > > >> > > > > > > the
> > > > >> > > > > > > > > cache.
> > > > >> > > > > > > > > > > Average size of segment metadata = 1KB. This
> > could
> > > > be
> > > > >> > more
> > > > >> > > if
> > > > >> > > > > we
> > > > >> > > > > > > have
> > > > >> > > > > > > > > > > frequent leader failover with a large number of
> > > > leader
> > > > >> > > epochs
> > > > >> > > > > > being
> > > > >> > > > > > > > > stored
> > > > >> > > > > > > > > > > per segment.
> > > > >> > > > > > > > > > > Segment size = 100MB. Users will prefer to
> > reduce
> > > > the
> > > > >> > > segment
> > > > >> > > > > > size
> > > > >> > > > > > > > > from the
> > > > >> > > > > > > > > > > default value of 1GB to ensure timely archival
> > of
> > > > data
> > > > >> > > since
> > > > >> > > > > data
> > > > >> > > > > > > > from
> > > > >> > > > > > > > > > > active segment is not archived.
> > > > >> > > > > > > > > > > Cache size = num segments * avg. segment
> > metadata
> > > > >> size =
> > > > >> > > > > > > > > (100TB/100MB)*1KB
> > > > >> > > > > > > > > > > = 1GB.
> > > > >> > > > > > > > > > > While 1GB for cache may not sound like a large
> > > > number
> > > > >> for
> > > > >> > > > > larger
> > > > >> > > > > > > > > machines,
> > > > >> > > > > > > > > > > it does eat into the memory as an additional
> > cache
> > > > and
> > > > >> > > makes
> > > > >> > > > > use
> > > > >> > > > > > > > cases
> > > > >> > > > > > > > > with
> > > > >> > > > > > > > > > > large data retention with low throughout
> > expensive
> > > > >> (where
> > > > >> > > > such
> > > > >> > > > > > use
> > > > >> > > > > > > > case
> > > > >> > > > > > > > > > > would could use smaller machines).
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > About point#2:
> > > > >> > > > > > > > > > > Even if we say that all segment metadata can fit
> > > > into
> > > > >> the
> > > > >> > > > > cache,
> > > > >> > > > > > we
> > > > >> > > > > > > > > will
> > > > >> > > > > > > > > > > need to populate the cache on broker startup. It
> > > > would
> > > > >> > not
> > > > >> > > be
> > > > >> > > > > in
> > > > >> > > > > > > the
> > > > >> > > > > > > > > > > critical patch of broker startup and hence won't
> > > > >> impact
> > > > >> > the
> > > > >> > > > > > startup
> > > > >> > > > > > > > > time.
> > > > >> > > > > > > > > > > But it will impact the time when we could start
> > > the
> > > > >> > > archival
> > > > >> > > > > > > process
> > > > >> > > > > > > > > since
> > > > >> > > > > > > > > > > the RLM thread pool will be blocked on the first
> > > > call
> > > > >> to
> > > > >> > > > > > > > > > > listRemoteLogSegments(). To scan metadata for
> > 1MM
> > > > >> > segments
> > > > >> > > > > > > (computed
> > > > >> > > > > > > > > above)
> > > > >> > > > > > > > > > > and transfer 1GB data over the network from a
> > RLMM
> > > > >> such
> > > > >> > as
> > > > >> > > a
> > > > >> > > > > > remote
> > > > >> > > > > > > > > > > database would be in the order of minutes
> > > (depending
> > > > >> on
> > > > >> > how
> > > > >> > > > > > > efficient
> > > > >> > > > > > > > > the
> > > > >> > > > > > > > > > > scan is with the RLMM implementation).
> > Although, I
> > > > >> would
> > > > >> > > > > concede
> > > > >> > > > > > > that
> > > > >> > > > > > > > > > > having RLM threads blocked for a few minutes is
> > > > >> perhaps
> > > > >> > OK
> > > > >> > > > but
> > > > >> > > > > if
> > > > >> > > > > > > we
> > > > >> > > > > > > > > > > introduce the new API proposed in the KIP, we
> > > would
> > > > >> have
> > > > >> > a
> > > > >> > > > > > > > > > > deterministic startup time for RLM. Adding the
> > API
> > > > >> comes
> > > > >> > > at a
> > > > >> > > > > low
> > > > >> > > > > > > > cost
> > > > >> > > > > > > > > and
> > > > >> > > > > > > > > > > I believe the trade off is worth it.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > About point#3:
> > > > >> > > > > > > > > > > We can use
> > listRemoteLogSegments(TopicIdPartition
> > > > >> > > > > > topicIdPartition,
> > > > >> > > > > > > > int
> > > > >> > > > > > > > > > > leaderEpoch) to calculate the segments eligible
> > > for
> > > > >> > > deletion
> > > > >> > > > > > (based
> > > > >> > > > > > > > on
> > > > >> > > > > > > > > size
> > > > >> > > > > > > > > > > retention) where leader epoch(s) belong to the
> > > > current
> > > > >> > > leader
> > > > >> > > > > > epoch
> > > > >> > > > > > > > > chain.
> > > > >> > > > > > > > > > > I understand that it may lead to segments
> > > belonging
> > > > to
> > > > >> > > other
> > > > >> > > > > > epoch
> > > > >> > > > > > > > > lineage
> > > > >> > > > > > > > > > > not getting deleted and would require a separate
> > > > >> > mechanism
> > > > >> > > to
> > > > >> > > > > > > delete
> > > > >> > > > > > > > > them.
> > > > >> > > > > > > > > > > The separate mechanism would anyways be required
> > > to
> > > > >> > delete
> > > > >> > > > > these
> > > > >> > > > > > > > > "leaked"
> > > > >> > > > > > > > > > > segments as there are other cases which could
> > lead
> > > > to
> > > > >> > leaks
> > > > >> > > > > such
> > > > >> > > > > > as
> > > > >> > > > > > > > > network
> > > > >> > > > > > > > > > > problems with RSM mid way writing through.
> > segment
> > > > >> etc.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Thank you for the replies so far. They have made
> > > me
> > > > >> > > re-think
> > > > >> > > > my
> > > > >> > > > > > > > > assumptions
> > > > >> > > > > > > > > > > and this dialogue has been very constructive for
> > > me.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Regards,
> > > > >> > > > > > > > > > > Divij Vaidya
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > On Thu, Nov 10, 2022 at 10:49 PM Jun Rao
> > > > >> > > > > > <[email protected]
> > > > >> > > > > > > >
> > > > >> > > > > > > > > wrote:
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > Hi, Divij,
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > Thanks for the reply.
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > It's true that the data in Kafka could be kept
> > > > >> longer
> > > > >> > > with
> > > > >> > > > > > > KIP-405.
> > > > >> > > > > > > > > How
> > > > >> > > > > > > > > > > > much data do you envision to have per broker?
> > > For
> > > > >> 100TB
> > > > >> > > > data
> > > > >> > > > > > per
> > > > >> > > > > > > > > broker,
> > > > >> > > > > > > > > > > > with 1GB segment and segment metadata of 100
> > > > bytes,
> > > > >> it
> > > > >> > > > > requires
> > > > >> > > > > > > > > > > > 100TB/1GB*100 = 10MB, which should fit in
> > > memory.
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > RemoteLogMetadataManager has two
> > > > >> > listRemoteLogSegments()
> > > > >> > > > > > methods.
> > > > >> > > > > > > > > The one
> > > > >> > > > > > > > > > > > you listed
> > > listRemoteLogSegments(TopicIdPartition
> > > > >> > > > > > > topicIdPartition,
> > > > >> > > > > > > > > int
> > > > >> > > > > > > > > > > > leaderEpoch) does return data in offset order.
> > > > >> However,
> > > > >> > > the
> > > > >> > > > > > other
> > > > >> > > > > > > > > > > > one listRemoteLogSegments(TopicIdPartition
> > > > >> > > > topicIdPartition)
> > > > >> > > > > > > > doesn't
> > > > >> > > > > > > > > > > > specify the return order. I assume that you
> > need
> > > > the
> > > > >> > > latter
> > > > >> > > > > to
> > > > >> > > > > > > > > calculate
> > > > >> > > > > > > > > > > > the segment size?
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > Thanks,
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > Jun
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > On Thu, Nov 10, 2022 at 10:25 AM Divij Vaidya
> > <
> > > > >> > > > > > > > > [email protected]>
> > > > >> > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > *Jun,*
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > *"the default implementation of RLMM does
> > > local
> > > > >> > > caching,
> > > > >> > > > > > > right?"*
> > > > >> > > > > > > > > > > > > Yes, Jun. The default implementation of RLMM
> > > > does
> > > > >> > > indeed
> > > > >> > > > > > cache
> > > > >> > > > > > > > the
> > > > >> > > > > > > > > > > > segment
> > > > >> > > > > > > > > > > > > metadata today, hence, it won't work for use
> > > > cases
> > > > >> > when
> > > > >> > > > the
> > > > >> > > > > > > > number
> > > > >> > > > > > > > > of
> > > > >> > > > > > > > > > > > > segments in remote storage is large enough
> > to
> > > > >> exceed
> > > > >> > > the
> > > > >> > > > > size
> > > > >> > > > > > > of
> > > > >> > > > > > > > > cache.
> > > > >> > > > > > > > > > > > As
> > > > >> > > > > > > > > > > > > part of this KIP, I will implement the new
> > > > >> proposed
> > > > >> > API
> > > > >> > > > in
> > > > >> > > > > > the
> > > > >> > > > > > > > > default
> > > > >> > > > > > > > > > > > > implementation of RLMM but the underlying
> > > > >> > > implementation
> > > > >> > > > > will
> > > > >> > > > > > > > > still be
> > > > >> > > > > > > > > > > a
> > > > >> > > > > > > > > > > > > scan. I will pick up optimizing that in a
> > > > separate
> > > > >> > PR.
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > *"we also cache all segment metadata in the
> > > > >> brokers
> > > > >> > > > without
> > > > >> > > > > > > > > KIP-405. Do
> > > > >> > > > > > > > > > > > you
> > > > >> > > > > > > > > > > > > see a need to change that?"*
> > > > >> > > > > > > > > > > > > Please correct me if I am wrong here but we
> > > > cache
> > > > >> > > > metadata
> > > > >> > > > > > for
> > > > >> > > > > > > > > segments
> > > > >> > > > > > > > > > > > > "residing in local storage". The size of the
> > > > >> current
> > > > >> > > > cache
> > > > >> > > > > > > works
> > > > >> > > > > > > > > fine
> > > > >> > > > > > > > > > > for
> > > > >> > > > > > > > > > > > > the scale of the number of segments that we
> > > > >> expect to
> > > > >> > > > store
> > > > >> > > > > > in
> > > > >> > > > > > > > > local
> > > > >> > > > > > > > > > > > > storage. After KIP-405, that cache will
> > > continue
> > > > >> to
> > > > >> > > store
> > > > >> > > > > > > > metadata
> > > > >> > > > > > > > > for
> > > > >> > > > > > > > > > > > > segments which are residing in local storage
> > > and
> > > > >> > hence,
> > > > >> > > > we
> > > > >> > > > > > > don't
> > > > >> > > > > > > > > need
> > > > >> > > > > > > > > > > to
> > > > >> > > > > > > > > > > > > change that. For segments which have been
> > > > >> offloaded
> > > > >> > to
> > > > >> > > > > remote
> > > > >> > > > > > > > > storage,
> > > > >> > > > > > > > > > > it
> > > > >> > > > > > > > > > > > > would rely on RLMM. Note that the scale of
> > > data
> > > > >> > stored
> > > > >> > > in
> > > > >> > > > > > RLMM
> > > > >> > > > > > > is
> > > > >> > > > > > > > > > > > different
> > > > >> > > > > > > > > > > > > from local cache because the number of
> > > segments
> > > > is
> > > > >> > > > expected
> > > > >> > > > > > to
> > > > >> > > > > > > be
> > > > >> > > > > > > > > much
> > > > >> > > > > > > > > > > > > larger than what current implementation
> > stores
> > > > in
> > > > >> > local
> > > > >> > > > > > > storage.
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > 2,3,4:
> > > > >> > RemoteLogMetadataManager.listRemoteLogSegments()
> > > > >> > > > > does
> > > > >> > > > > > > > > specify
> > > > >> > > > > > > > > > > the
> > > > >> > > > > > > > > > > > > order i.e. it returns the segments sorted by
> > > > first
> > > > >> > > offset
> > > > >> > > > > in
> > > > >> > > > > > > > > ascending
> > > > >> > > > > > > > > > > > > order. I am copying the API docs for KIP-405
> > > > here
> > > > >> for
> > > > >> > > > your
> > > > >> > > > > > > > > reference
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > *Returns iterator of remote log segment
> > > > metadata,
> > > > >> > > sorted
> > > > >> > > > by
> > > > >> > > > > > > > {@link
> > > > >> > > > > > > > > > > > > RemoteLogSegmentMetadata#startOffset()}
> > > > >> inascending
> > > > >> > > order
> > > > >> > > > > > which
> > > > >> > > > > > > > > > > contains
> > > > >> > > > > > > > > > > > > the given leader epoch. This is used by
> > remote
> > > > log
> > > > >> > > > > retention
> > > > >> > > > > > > > > management
> > > > >> > > > > > > > > > > > > subsystemto fetch the segment metadata for a
> > > > given
> > > > >> > > leader
> > > > >> > > > > > > > > epoch.@param
> > > > >> > > > > > > > > > > > > topicIdPartition topic partition@param
> > > > >> leaderEpoch
> > > > >> > > > > > leader
> > > > >> > > > > > > > > > > > > epoch@return
> > > > >> > > > > > > > > > > > > Iterator of remote segments, sorted by start
> > > > >> offset
> > > > >> > in
> > > > >> > > > > > > ascending
> > > > >> > > > > > > > > > > order. *
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > *Luke,*
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > 5. Note that we are trying to optimize the
> > > > >> efficiency
> > > > >> > > of
> > > > >> > > > > size
> > > > >> > > > > > > > based
> > > > >> > > > > > > > > > > > > retention for remote storage. KIP-405 does
> > not
> > > > >> > > introduce
> > > > >> > > > a
> > > > >> > > > > > new
> > > > >> > > > > > > > > config
> > > > >> > > > > > > > > > > for
> > > > >> > > > > > > > > > > > > periodically checking remote similar to
> > > > >> > > > > > > > > > > log.retention.check.interval.ms
> > > > >> > > > > > > > > > > > > which is applicable for remote storage.
> > Hence,
> > > > the
> > > > >> > > metric
> > > > >> > > > > > will
> > > > >> > > > > > > be
> > > > >> > > > > > > > > > > updated
> > > > >> > > > > > > > > > > > > at the time of invoking log retention check
> > > for
> > > > >> > remote
> > > > >> > > > tier
> > > > >> > > > > > > which
> > > > >> > > > > > > > > is
> > > > >> > > > > > > > > > > > > pending implementation today. We can perhaps
> > > > come
> > > > >> > back
> > > > >> > > > and
> > > > >> > > > > > > update
> > > > >> > > > > > > > > the
> > > > >> > > > > > > > > > > > > metric description after the implementation
> > of
> > > > log
> > > > >> > > > > retention
> > > > >> > > > > > > > check
> > > > >> > > > > > > > > in
> > > > >> > > > > > > > > > > > > RemoteLogManager.
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > --
> > > > >> > > > > > > > > > > > > Divij Vaidya
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > On Thu, Nov 10, 2022 at 6:16 AM Luke Chen <
> > > > >> > > > > [email protected]
> > > > >> > > > > > >
> > > > >> > > > > > > > > wrote:
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > Hi Divij,
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > One more question about the metric:
> > > > >> > > > > > > > > > > > > > I think the metric will be updated when
> > > > >> > > > > > > > > > > > > > (1) each time we run the log retention
> > check
> > > > >> (that
> > > > >> > > is,
> > > > >> > > > > > > > > > > > > > log.retention.check.interval.ms)
> > > > >> > > > > > > > > > > > > > (2) When user explicitly call
> > > getRemoteLogSize
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > Is that correct?
> > > > >> > > > > > > > > > > > > > Maybe we should add a note in metric
> > > > >> description,
> > > > >> > > > > > otherwise,
> > > > >> > > > > > > > when
> > > > >> > > > > > > > > > > user
> > > > >> > > > > > > > > > > > > got,
> > > > >> > > > > > > > > > > > > > let's say 0 of RemoteLogSizeBytes, will be
> > > > >> > surprised.
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > Otherwise, LGTM
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > Thank you for the KIP
> > > > >> > > > > > > > > > > > > > Luke
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > On Thu, Nov 10, 2022 at 2:55 AM Jun Rao
> > > > >> > > > > > > > <[email protected]
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > Hi, Divij,
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > Thanks for the explanation.
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > 1. Hmm, the default implementation of
> > RLMM
> > > > >> does
> > > > >> > > local
> > > > >> > > > > > > > caching,
> > > > >> > > > > > > > > > > right?
> > > > >> > > > > > > > > > > > > > > Currently, we also cache all segment
> > > > metadata
> > > > >> in
> > > > >> > > the
> > > > >> > > > > > > brokers
> > > > >> > > > > > > > > > > without
> > > > >> > > > > > > > > > > > > > > KIP-405. Do you see a need to change
> > that?
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > 2,3,4: Yes, your explanation makes
> > sense.
> > > > >> > However,
> > > > >> > > > > > > > > > > > > > > currently,
> > > > >> > > > > > RemoteLogMetadataManager.listRemoteLogSegments()
> > > > >> > > > > > > > > doesn't
> > > > >> > > > > > > > > > > > > > specify
> > > > >> > > > > > > > > > > > > > > a particular order of the iterator. Do
> > you
> > > > >> intend
> > > > >> > > to
> > > > >> > > > > > change
> > > > >> > > > > > > > > that?
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > Thanks,
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > Jun
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > On Tue, Nov 8, 2022 at 3:31 AM Divij
> > > Vaidya
> > > > <
> > > > >> > > > > > > > > > > [email protected]
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > Hey Jun
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > Thank you for your comments.
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > *1. "RLMM implementor could ensure
> > that
> > > > >> > > > > > > > > listRemoteLogSegments()
> > > > >> > > > > > > > > > > is
> > > > >> > > > > > > > > > > > > > fast"*
> > > > >> > > > > > > > > > > > > > > > This would be ideal but pragmatically,
> > > it
> > > > is
> > > > >> > > > > difficult
> > > > >> > > > > > to
> > > > >> > > > > > > > > ensure
> > > > >> > > > > > > > > > > > that
> > > > >> > > > > > > > > > > > > > > > listRemoteLogSegments() is fast. This
> > is
> > > > >> > because
> > > > >> > > of
> > > > >> > > > > the
> > > > >> > > > > > > > > > > possibility
> > > > >> > > > > > > > > > > > > of
> > > > >> > > > > > > > > > > > > > a
> > > > >> > > > > > > > > > > > > > > > large number of segments (much larger
> > > than
> > > > >> what
> > > > >> > > > Kafka
> > > > >> > > > > > > > > currently
> > > > >> > > > > > > > > > > > > handles
> > > > >> > > > > > > > > > > > > > > > with local storage today) would make
> > it
> > > > >> > > infeasible
> > > > >> > > > to
> > > > >> > > > > > > adopt
> > > > >> > > > > > > > > > > > > strategies
> > > > >> > > > > > > > > > > > > > > such
> > > > >> > > > > > > > > > > > > > > > as local caching to improve the
> > > > performance
> > > > >> of
> > > > >> > > > > > > > > > > > listRemoteLogSegments.
> > > > >> > > > > > > > > > > > > > > Apart
> > > > >> > > > > > > > > > > > > > > > from caching (which won't work due to
> > > size
> > > > >> > > > > > limitations) I
> > > > >> > > > > > > > > can't
> > > > >> > > > > > > > > > > > think
> > > > >> > > > > > > > > > > > > > of
> > > > >> > > > > > > > > > > > > > > > other strategies which may eliminate
> > the
> > > > >> need
> > > > >> > for
> > > > >> > > > IO
> > > > >> > > > > > > > > > > > > > > > operations proportional to the number
> > of
> > > > >> total
> > > > >> > > > > > segments.
> > > > >> > > > > > > > > Please
> > > > >> > > > > > > > > > > > > advise
> > > > >> > > > > > > > > > > > > > if
> > > > >> > > > > > > > > > > > > > > > you have something in mind.
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > 2.  "*If the size exceeds the
> > retention
> > > > >> size,
> > > > >> > we
> > > > >> > > > need
> > > > >> > > > > > to
> > > > >> > > > > > > > > > > determine
> > > > >> > > > > > > > > > > > > the
> > > > >> > > > > > > > > > > > > > > > subset of segments to delete to bring
> > > the
> > > > >> size
> > > > >> > > > within
> > > > >> > > > > > the
> > > > >> > > > > > > > > > > retention
> > > > >> > > > > > > > > > > > > > > limit.
> > > > >> > > > > > > > > > > > > > > > Do we need to call
> > > > >> > > > > > > > > > > RemoteLogMetadataManager.listRemoteLogSegments()
> > > > >> > > > > > > > > > > > > to
> > > > >> > > > > > > > > > > > > > > > determine that?"*
> > > > >> > > > > > > > > > > > > > > > Yes, we need to call
> > > > >> listRemoteLogSegments() to
> > > > >> > > > > > determine
> > > > >> > > > > > > > > which
> > > > >> > > > > > > > > > > > > > segments
> > > > >> > > > > > > > > > > > > > > > should be deleted. But there is a
> > > > difference
> > > > >> > with
> > > > >> > > > the
> > > > >> > > > > > use
> > > > >> > > > > > > > > case we
> > > > >> > > > > > > > > > > > are
> > > > >> > > > > > > > > > > > > > > > trying to optimize with this KIP. To
> > > > >> determine
> > > > >> > > the
> > > > >> > > > > > subset
> > > > >> > > > > > > > of
> > > > >> > > > > > > > > > > > segments
> > > > >> > > > > > > > > > > > > > > which
> > > > >> > > > > > > > > > > > > > > > would be deleted, we only read
> > metadata
> > > > for
> > > > >> > > > segments
> > > > >> > > > > > > which
> > > > >> > > > > > > > > would
> > > > >> > > > > > > > > > > be
> > > > >> > > > > > > > > > > > > > > deleted
> > > > >> > > > > > > > > > > > > > > > via the listRemoteLogSegments(). But
> > to
> > > > >> > determine
> > > > >> > > > the
> > > > >> > > > > > > > > > > totalLogSize,
> > > > >> > > > > > > > > > > > > > which
> > > > >> > > > > > > > > > > > > > > > is required every time retention logic
> > > > >> based on
> > > > >> > > > size
> > > > >> > > > > > > > > executes, we
> > > > >> > > > > > > > > > > > > read
> > > > >> > > > > > > > > > > > > > > > metadata of *all* the segments in
> > remote
> > > > >> > storage.
> > > > >> > > > > > Hence,
> > > > >> > > > > > > > the
> > > > >> > > > > > > > > > > number
> > > > >> > > > > > > > > > > > > of
> > > > >> > > > > > > > > > > > > > > > results returned by
> > > > >> > > > > > > > > > > >
> > > *RemoteLogMetadataManager.listRemoteLogSegments()
> > > > >> > > > > > > > > > > > > > *is
> > > > >> > > > > > > > > > > > > > > > different when we are calculating
> > > > >> totalLogSize
> > > > >> > > vs.
> > > > >> > > > > when
> > > > >> > > > > > > we
> > > > >> > > > > > > > > are
> > > > >> > > > > > > > > > > > > > > determining
> > > > >> > > > > > > > > > > > > > > > the subset of segments to delete.
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > 3.
> > > > >> > > > > > > > > > > > > > > > *"Also, what about time-based
> > retention?
> > > > To
> > > > >> > make
> > > > >> > > > that
> > > > >> > > > > > > > > efficient,
> > > > >> > > > > > > > > > > do
> > > > >> > > > > > > > > > > > > we
> > > > >> > > > > > > > > > > > > > > need
> > > > >> > > > > > > > > > > > > > > > to make some additional interface
> > > > >> changes?"*No.
> > > > >> > > > Note
> > > > >> > > > > > that
> > > > >> > > > > > > > > time
> > > > >> > > > > > > > > > > > > > complexity
> > > > >> > > > > > > > > > > > > > > > to determine the segments for
> > retention
> > > is
> > > > >> > > > different
> > > > >> > > > > > for
> > > > >> > > > > > > > time
> > > > >> > > > > > > > > > > based
> > > > >> > > > > > > > > > > > > vs.
> > > > >> > > > > > > > > > > > > > > > size based. For time based, the time
> > > > >> complexity
> > > > >> > > is
> > > > >> > > > a
> > > > >> > > > > > > > > function of
> > > > >> > > > > > > > > > > > the
> > > > >> > > > > > > > > > > > > > > number
> > > > >> > > > > > > > > > > > > > > > of segments which are "eligible for
> > > > >> deletion"
> > > > >> > > > (since
> > > > >> > > > > we
> > > > >> > > > > > > > only
> > > > >> > > > > > > > > read
> > > > >> > > > > > > > > > > > > > > metadata
> > > > >> > > > > > > > > > > > > > > > for segments which would be deleted)
> > > > >> whereas in
> > > > >> > > > size
> > > > >> > > > > > > based
> > > > >> > > > > > > > > > > > retention,
> > > > >> > > > > > > > > > > > > > the
> > > > >> > > > > > > > > > > > > > > > time complexity is a function of "all
> > > > >> segments"
> > > > >> > > > > > available
> > > > >> > > > > > > > in
> > > > >> > > > > > > > > > > remote
> > > > >> > > > > > > > > > > > > > > storage
> > > > >> > > > > > > > > > > > > > > > (metadata of all segments needs to be
> > > read
> > > > >> to
> > > > >> > > > > calculate
> > > > >> > > > > > > the
> > > > >> > > > > > > > > total
> > > > >> > > > > > > > > > > > > > size).
> > > > >> > > > > > > > > > > > > > > As
> > > > >> > > > > > > > > > > > > > > > you may observe, this KIP will bring
> > the
> > > > >> time
> > > > >> > > > > > complexity
> > > > >> > > > > > > > for
> > > > >> > > > > > > > > both
> > > > >> > > > > > > > > > > > > time
> > > > >> > > > > > > > > > > > > > > > based retention & size based retention
> > > to
> > > > >> the
> > > > >> > > same
> > > > >> > > > > > > > function.
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > 4. Also, please note that this new API
> > > > >> > introduced
> > > > >> > > > in
> > > > >> > > > > > this
> > > > >> > > > > > > > KIP
> > > > >> > > > > > > > > > > also
> > > > >> > > > > > > > > > > > > > > enables
> > > > >> > > > > > > > > > > > > > > > us to provide a metric for total size
> > of
> > > > >> data
> > > > >> > > > stored
> > > > >> > > > > in
> > > > >> > > > > > > > > remote
> > > > >> > > > > > > > > > > > > storage.
> > > > >> > > > > > > > > > > > > > > > Without the API, calculation of this
> > > > metric
> > > > >> > will
> > > > >> > > > > become
> > > > >> > > > > > > > very
> > > > >> > > > > > > > > > > > > expensive
> > > > >> > > > > > > > > > > > > > > with
> > > > >> > > > > > > > > > > > > > > > *listRemoteLogSegments().*
> > > > >> > > > > > > > > > > > > > > > I understand that your motivation here
> > > is
> > > > to
> > > > >> > > avoid
> > > > >> > > > > > > > polluting
> > > > >> > > > > > > > > the
> > > > >> > > > > > > > > > > > > > > interface
> > > > >> > > > > > > > > > > > > > > > with optimization specific APIs and I
> > > will
> > > > >> > agree
> > > > >> > > > with
> > > > >> > > > > > > that
> > > > >> > > > > > > > > goal.
> > > > >> > > > > > > > > > > > But
> > > > >> > > > > > > > > > > > > I
> > > > >> > > > > > > > > > > > > > > > believe that this new API proposed in
> > > the
> > > > >> KIP
> > > > >> > > > brings
> > > > >> > > > > in
> > > > >> > > > > > > > > > > significant
> > > > >> > > > > > > > > > > > > > > > improvement and there is no other work
> > > > >> around
> > > > >> > > > > available
> > > > >> > > > > > > to
> > > > >> > > > > > > > > > > achieve
> > > > >> > > > > > > > > > > > > the
> > > > >> > > > > > > > > > > > > > > same
> > > > >> > > > > > > > > > > > > > > > performance.
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > Regards,
> > > > >> > > > > > > > > > > > > > > > Divij Vaidya
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > On Tue, Nov 8, 2022 at 12:12 AM Jun
> > Rao
> > > > >> > > > > > > > > <[email protected]
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > Hi, Divij,
> > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > Thanks for the KIP. Sorry for the
> > late
> > > > >> reply.
> > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > The motivation of the KIP is to
> > > improve
> > > > >> the
> > > > >> > > > > > efficiency
> > > > >> > > > > > > of
> > > > >> > > > > > > > > size
> > > > >> > > > > > > > > > > > > based
> > > > >> > > > > > > > > > > > > > > > > retention. I am not sure the
> > proposed
> > > > >> changes
> > > > >> > > are
> > > > >> > > > > > > enough.
> > > > >> > > > > > > > > For
> > > > >> > > > > > > > > > > > > > example,
> > > > >> > > > > > > > > > > > > > > if
> > > > >> > > > > > > > > > > > > > > > > the size exceeds the retention size,
> > > we
> > > > >> need
> > > > >> > to
> > > > >> > > > > > > determine
> > > > >> > > > > > > > > the
> > > > >> > > > > > > > > > > > > subset
> > > > >> > > > > > > > > > > > > > of
> > > > >> > > > > > > > > > > > > > > > > segments to delete to bring the size
> > > > >> within
> > > > >> > the
> > > > >> > > > > > > retention
> > > > >> > > > > > > > > > > limit.
> > > > >> > > > > > > > > > > > Do
> > > > >> > > > > > > > > > > > > > we
> > > > >> > > > > > > > > > > > > > > > need
> > > > >> > > > > > > > > > > > > > > > > to call
> > > > >> > > > > > > RemoteLogMetadataManager.listRemoteLogSegments()
> > > > >> > > > > > > > to
> > > > >> > > > > > > > > > > > > determine
> > > > >> > > > > > > > > > > > > > > > that?
> > > > >> > > > > > > > > > > > > > > > > Also, what about time-based
> > retention?
> > > > To
> > > > >> > make
> > > > >> > > > that
> > > > >> > > > > > > > > efficient,
> > > > >> > > > > > > > > > > do
> > > > >> > > > > > > > > > > > > we
> > > > >> > > > > > > > > > > > > > > need
> > > > >> > > > > > > > > > > > > > > > > to make some additional interface
> > > > changes?
> > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > An alternative approach is for the
> > > RLMM
> > > > >> > > > implementor
> > > > >> > > > > > to
> > > > >> > > > > > > > make
> > > > >> > > > > > > > > > > sure
> > > > >> > > > > > > > > > > > > > > > > that
> > > > >> > > > > RemoteLogMetadataManager.listRemoteLogSegments()
> > > > >> > > > > > > is
> > > > >> > > > > > > > > fast
> > > > >> > > > > > > > > > > > > (e.g.,
> > > > >> > > > > > > > > > > > > > > with
> > > > >> > > > > > > > > > > > > > > > > local caching). This way, we could
> > > keep
> > > > >> the
> > > > >> > > > > interface
> > > > >> > > > > > > > > simple.
> > > > >> > > > > > > > > > > > Have
> > > > >> > > > > > > > > > > > > we
> > > > >> > > > > > > > > > > > > > > > > considered that?
> > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > Thanks,
> > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > Jun
> > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > On Wed, Sep 28, 2022 at 6:28 AM
> > Divij
> > > > >> Vaidya
> > > > >> > <
> > > > >> > > > > > > > > > > > > > [email protected]>
> > > > >> > > > > > > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > Hey folks
> > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > Does anyone else have any thoughts
> > > on
> > > > >> this
> > > > >> > > > > before I
> > > > >> > > > > > > > > propose
> > > > >> > > > > > > > > > > > this
> > > > >> > > > > > > > > > > > > > for
> > > > >> > > > > > > > > > > > > > > a
> > > > >> > > > > > > > > > > > > > > > > > vote?
> > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > --
> > > > >> > > > > > > > > > > > > > > > > > Divij Vaidya
> > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > On Mon, Sep 5, 2022 at 12:57 PM
> > > Satish
> > > > >> > > Duggana
> > > > >> > > > <
> > > > >> > > > > > > > > > > > > > > > [email protected]
> > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > Thanks for the KIP Divij!
> > > > >> > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > This is a nice improvement to
> > > avoid
> > > > >> > > > > recalculation
> > > > >> > > > > > > of
> > > > >> > > > > > > > > size.
> > > > >> > > > > > > > > > > > > > > Customized
> > > > >> > > > > > > > > > > > > > > > > > RLMMs
> > > > >> > > > > > > > > > > > > > > > > > > can implement the best possible
> > > > >> approach
> > > > >> > by
> > > > >> > > > > > caching
> > > > >> > > > > > > > or
> > > > >> > > > > > > > > > > > > > maintaining
> > > > >> > > > > > > > > > > > > > > > the
> > > > >> > > > > > > > > > > > > > > > > > size
> > > > >> > > > > > > > > > > > > > > > > > > in an efficient way. But this is
> > > > not a
> > > > >> > big
> > > > >> > > > > > concern
> > > > >> > > > > > > > for
> > > > >> > > > > > > > > the
> > > > >> > > > > > > > > > > > > > default
> > > > >> > > > > > > > > > > > > > > > > topic
> > > > >> > > > > > > > > > > > > > > > > > > based RLMM as mentioned in the
> > > KIP.
> > > > >> > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > ~Satish.
> > > > >> > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > On Wed, 13 Jul 2022 at 18:48,
> > > Divij
> > > > >> > Vaidya
> > > > >> > > <
> > > > >> > > > > > > > > > > > > > > [email protected]>
> > > > >> > > > > > > > > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > Thank you for your review
> > Luke.
> > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > > Reg: is that would the new
> > > > >> > > > > > `RemoteLogSizeBytes`
> > > > >> > > > > > > > > metric
> > > > >> > > > > > > > > > > > be a
> > > > >> > > > > > > > > > > > > > > > > > performance
> > > > >> > > > > > > > > > > > > > > > > > > > overhead? Although we move the
> > > > >> > > calculation
> > > > >> > > > > to a
> > > > >> > > > > > > > > seperate
> > > > >> > > > > > > > > > > > API,
> > > > >> > > > > > > > > > > > > > we
> > > > >> > > > > > > > > > > > > > > > > still
> > > > >> > > > > > > > > > > > > > > > > > > > can't assume users will
> > > implement
> > > > a
> > > > >> > > > > > light-weight
> > > > >> > > > > > > > > method,
> > > > >> > > > > > > > > > > > > right?
> > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > This metric would be logged
> > > using
> > > > >> the
> > > > >> > > > > > information
> > > > >> > > > > > > > > that is
> > > > >> > > > > > > > > > > > > > already
> > > > >> > > > > > > > > > > > > > > > > being
> > > > >> > > > > > > > > > > > > > > > > > > > calculated for handling remote
> > > > >> > retention
> > > > >> > > > > logic,
> > > > >> > > > > > > > > hence, no
> > > > >> > > > > > > > > > > > > > > > additional
> > > > >> > > > > > > > > > > > > > > > > > work
> > > > >> > > > > > > > > > > > > > > > > > > > is required to calculate this
> > > > >> metric.
> > > > >> > > More
> > > > >> > > > > > > > > specifically,
> > > > >> > > > > > > > > > > > > > whenever
> > > > >> > > > > > > > > > > > > > > > > > > > RemoteLogManager calls
> > > > >> getRemoteLogSize
> > > > >> > > > API,
> > > > >> > > > > > this
> > > > >> > > > > > > > > metric
> > > > >> > > > > > > > > > > > > would
> > > > >> > > > > > > > > > > > > > be
> > > > >> > > > > > > > > > > > > > > > > > > captured.
> > > > >> > > > > > > > > > > > > > > > > > > > This API call is made every
> > time
> > > > >> > > > > > RemoteLogManager
> > > > >> > > > > > > > > wants
> > > > >> > > > > > > > > > > to
> > > > >> > > > > > > > > > > > > > handle
> > > > >> > > > > > > > > > > > > > > > > > expired
> > > > >> > > > > > > > > > > > > > > > > > > > remote log segments (which
> > > should
> > > > be
> > > > >> > > > > periodic).
> > > > >> > > > > > > > Does
> > > > >> > > > > > > > > that
> > > > >> > > > > > > > > > > > > > address
> > > > >> > > > > > > > > > > > > > > > > your
> > > > >> > > > > > > > > > > > > > > > > > > > concern?
> > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > Divij Vaidya
> > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > On Tue, Jul 12, 2022 at 11:01
> > AM
> > > > >> Luke
> > > > >> > > Chen
> > > > >> > > > <
> > > > >> > > > > > > > > > > > > [email protected]>
> > > > >> > > > > > > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > > Hi Divij,
> > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > > Thanks for the KIP!
> > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > > I think it makes sense to
> > > > delegate
> > > > >> > the
> > > > >> > > > > > > > > responsibility
> > > > >> > > > > > > > > > > of
> > > > >> > > > > > > > > > > > > > > > > calculation
> > > > >> > > > > > > > > > > > > > > > > > to
> > > > >> > > > > > > > > > > > > > > > > > > > the
> > > > >> > > > > > > > > > > > > > > > > > > > > specific
> > > > RemoteLogMetadataManager
> > > > >> > > > > > > implementation.
> > > > >> > > > > > > > > > > > > > > > > > > > > But one thing I'm not quite
> > > > sure,
> > > > >> is
> > > > >> > > that
> > > > >> > > > > > would
> > > > >> > > > > > > > > the new
> > > > >> > > > > > > > > > > > > > > > > > > > > `RemoteLogSizeBytes` metric
> > > be a
> > > > >> > > > > performance
> > > > >> > > > > > > > > overhead?
> > > > >> > > > > > > > > > > > > > > > > > > > > Although we move the
> > > calculation
> > > > >> to a
> > > > >> > > > > > seperate
> > > > >> > > > > > > > > API, we
> > > > >> > > > > > > > > > > > > still
> > > > >> > > > > > > > > > > > > > > > can't
> > > > >> > > > > > > > > > > > > > > > > > > assume
> > > > >> > > > > > > > > > > > > > > > > > > > > users will implement a
> > > > >> light-weight
> > > > >> > > > method,
> > > > >> > > > > > > > right?
> > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > > Thank you.
> > > > >> > > > > > > > > > > > > > > > > > > > > Luke
> > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > > On Fri, Jul 1, 2022 at 5:47
> > PM
> > > > >> Divij
> > > > >> > > > > Vaidya <
> > > > >> > > > > > > > > > > > > > > > > [email protected]
> > > > >> > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-852%3A+Optimize+calculation+of+size+for+log+in+remote+tier
> > > > >> > > > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > > > Hey folks
> > > > >> > > > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > > > Please take a look at this
> > > KIP
> > > > >> > which
> > > > >> > > > > > proposes
> > > > >> > > > > > > > an
> > > > >> > > > > > > > > > > > > extension
> > > > >> > > > > > > > > > > > > > to
> > > > >> > > > > > > > > > > > > > > > > > > KIP-405.
> > > > >> > > > > > > > > > > > > > > > > > > > > This
> > > > >> > > > > > > > > > > > > > > > > > > > > > is my first KIP with
> > Apache
> > > > >> Kafka
> > > > >> > > > > community
> > > > >> > > > > > > so
> > > > >> > > > > > > > > any
> > > > >> > > > > > > > > > > > > feedback
> > > > >> > > > > > > > > > > > > > > > would
> > > > >> > > > > > > > > > > > > > > > > > be
> > > > >> > > > > > > > > > > > > > > > > > > > > highly
> > > > >> > > > > > > > > > > > > > > > > > > > > > appreciated.
> > > > >> > > > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > > > Cheers!
> > > > >> > > > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > > > --
> > > > >> > > > > > > > > > > > > > > > > > > > > > Divij Vaidya
> > > > >> > > > > > > > > > > > > > > > > > > > > > Sr. Software Engineer
> > > > >> > > > > > > > > > > > > > > > > > > > > > Amazon
> > > > >> > > > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > > >
> > > > >> > > > > > > > > > > > >
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >

Re: [DISCUSS] KIP-852 Optimize calculation of size for log in remote tier

Reply via email to