Heya all!

I have updated KIP-950. A list of what I have updated is:

* Explicitly state that Zookeeper-backed clusters will have ENABLED ->
DISABLING -> DISABLED while KRaft-backed clusters will only have ENABLED ->
DISABLED
* Added two configurations for the new thread pools and explained where
values will be picked-up mid Kafka version upgrade
* Explained how leftover remote partitions will be scheduled for deletion
* Updated the API to use StopReplica V5 rather than a whole new
controller-to-broker API
* Explained that the disablement procedure will be triggered by the
controller listening for an (Incremental)AlterConfig change
* Explained that we will first move log start offset and then issue a
deletion
* Went into more details that changing remote.log.disable.policy after
disablement won't do anything and that if a customer would like additional
data deleted they would have to use already existing methods

Let me know if there are any new comments or I have missed something!

Best,
Christo

On Mon, 15 Apr 2024 at 12:40, Christo Lolov <christolo...@gmail.com> wrote:

> Heya Doguscan,
>
> I believe that the state of the world after this KIP will be the following:
>
> For Zookeeper-backed clusters there will be 3 states: ENABLED, DISABLING
> and DISABLED. We want this because Zookeeper-backed clusters will await a
> confirmation from the brokers that they have indeed stopped tiered-related
> operations on the topic.
>
> For KRaft-backed clusters there will be only 2 states: ENABLED and
> DISABLED. KRaft takes a fire-and-forget approach for topic deletion. I
> believe the same approach ought to be taken for tiered topics. The
> mechanism which will ensure that leftover state in remote due to failures
> is cleaned up to me is the retention mechanism. In today's code, a leader
> deletes all segments it finds in remote with offsets below the log start
> offset. I believe this will be good enough for cleaning up leftover state
> in remote due to failures.
>
> I know that quite a few changes have been discussed so I will aim to put
> them on paper in the upcoming days and let everyone know!
>
> Best,
> Christo
>
> On Tue, 9 Apr 2024 at 14:49, Doğuşcan Namal <namal.dogus...@gmail.com>
> wrote:
>
>> +1 let's not introduce a new api and mark it immediately as deprecated :)
>>
>> On your second comment Luke, one thing we need to clarify is when do we
>> consider remote storage to be DISABLED for a topic?
>> Particularly, what is the state when the remote storage is being deleted
>> in case of disablement.policy=delete? Is it DISABLING or DISABLED?
>>
>> If we move directly to the DISABLED state,
>>
>> a) in case of failures, the leaders should continue remote storage
>> deletion even if the topic is moved to the DISABLED state, otherwise we
>> risk having stray data on remote storage.
>> b) on each restart, we should initiate the remote storage deletion
>> because although we replayed a record with a DISABLED state, we can not be
>> sure if the remote data is deleted or not.
>>
>> We could either consider keeping the remote topic in DISABLING state
>> until all of the remote storage data is deleted, or we need an additional
>> mechanism to handle the remote stray data.
>>
>> The existing topic deletion, for instance, handles stray logs on disk by
>> detecting them on KafkaBroker startup and deleting before the
>> ReplicaManager is started.
>> Maybe we need a similar mechanism here as well if we don't want a
>> DISABLING state. Otherwise, we need a callback from Brokers to validate
>> that remote storage data is deleted and now we could move to the DISABLED
>> state.
>>
>> Thanks.
>>
>> On Tue, 9 Apr 2024 at 12:45, Luke Chen <show...@gmail.com> wrote:
>>
>>> Hi Christo,
>>>
>>> > I would then opt for moving information from DisableRemoteTopic
>>> within the StopReplicas API which will then disappear in KRaft world as
>>> it
>>> is already scheduled for deprecation. What do you think?
>>>
>>> Sounds good to me.
>>>
>>> Thanks.
>>> Luke
>>>
>>> On Tue, Apr 9, 2024 at 6:46 PM Christo Lolov <christolo...@gmail.com>
>>> wrote:
>>>
>>> > Heya Luke!
>>> >
>>> > I thought a bit more about it and I reached the same conclusion as you
>>> for
>>> > 2 as a follow-up from 1. In other words, in KRaft world I don't think
>>> the
>>> > controller needs to wait for acknowledgements for the brokers. All we
>>> care
>>> > about is that the leader (who is responsible for archiving/deleting
>>> data in
>>> > tiered storage) knows about the change and applies it properly. If
>>> there is
>>> > a leadership change halfway through the operation then the new leader
>>> still
>>> > needs to apply the message from the state topic and we know that a
>>> > disable-message will be applied before a reenablement-message. I will
>>> > change the KIP later today/tomorrow morning to reflect this reasoning.
>>> >
>>> > However, with this I believe that introducing a new API just for
>>> > Zookeeper-based clusters (i.e. DisableRemoteTopic) becomes a bit of an
>>> > overkill. I would then opt for moving information from
>>> DisableRemoteTopic
>>> > within the StopReplicas API which will then disappear in KRaft world
>>> as it
>>> > is already scheduled for deprecation. What do you think?
>>> >
>>> > Best,
>>> > Christo
>>> >
>>> > On Wed, 3 Apr 2024 at 07:59, Luke Chen <show...@gmail.com> wrote:
>>> >
>>> > > Hi Christo,
>>> > >
>>> > > 1. I agree with Doguscan that in KRaft mode, the controller won't
>>> send
>>> > RPCs
>>> > > to the brokers (except in the migration path).
>>> > > So, I think we could adopt the similar way we did to
>>> > `AlterReplicaLogDirs`
>>> > > (
>>> > > KIP-858
>>> > > <
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-858%3A+Handle+JBOD+broker+disk+failure+in+KRaft#KIP858:HandleJBODbrokerdiskfailureinKRaft-Intra-brokerreplicamovement
>>> > > >)
>>> > > that let the broker notify controller any update, instead of
>>> controller
>>> > to
>>> > > broker. And once the controller receives all the complete requests
>>> from
>>> > > brokers, it'll enter "Disabled" state. WDYT?
>>> > >
>>> > > 2. Why should we wait until all brokers to respond before moving to
>>> > > "Disabled" state in "KRaft mode"?
>>> > > Currently, only the leader node does the remote log upload/fetch
>>> tasks,
>>> > so
>>> > > does that mean the controller only need to make sure the leader
>>> completes
>>> > > the stopPartition?
>>> > > If during the leader node stopPartition process triggered leadership
>>> > > change, then the new leader should receive and apply the configRecord
>>> > > update before the leadership change record based on the KRaft design,
>>> > which
>>> > > means there will be no gap that the follower node becomes the leader
>>> and
>>> > > starting doing unexpected upload/fetch tasks, right?
>>> > > I agree we should make sure in ZK mode, all brokers are completed the
>>> > > stopPartitions before moving to "Disabled" state because ZK node
>>> watcher
>>> > is
>>> > > working in a separate thread. But not sure about KRaft mode.
>>> > >
>>> > > Thanks.
>>> > > Luke
>>> > >
>>> > >
>>> > > On Fri, Mar 29, 2024 at 4:14 PM Christo Lolov <
>>> christolo...@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > Heya everyone!
>>> > > >
>>> > > > re: Doguscan
>>> > > >
>>> > > > I believe the answer to 101 needs a bit more discussion. As far as
>>> I
>>> > > know,
>>> > > > tiered storage today has methods to update a metadata of a segment
>>> to
>>> > say
>>> > > > "hey, I would like this deleted", but actual deletion is left to
>>> plugin
>>> > > > implementations (or any background cleaners). In other words,
>>> there is
>>> > no
>>> > > > "immediate" deletion. In this KIP, we would like to continue doing
>>> the
>>> > > same
>>> > > > if the retention policy is set to delete. So I believe the answer
>>> is
>>> > > > actually that a) we will update the metadata of the segments to
>>> mark
>>> > them
>>> > > > as deleted and b) we will advance the log start offset. Any
>>> deletion of
>>> > > > actual files will still be delegated to plugin implementations. I
>>> > believe
>>> > > > this is further supported by "*remote.log.disable.policy=delete:*
>>> Logs
>>> > > that
>>> > > > are archived in the remote storage will not be part of the
>>> contiguous
>>> > > > "active" log and will be deleted asynchronously as part of the
>>> > > disablement
>>> > > > process"
>>> > > >
>>> > > > Following from the above, I believe for 102 it is fine to allow
>>> setting
>>> > > of
>>> > > > remote.log.disable.policy on a disabled topic in much the same way
>>> we
>>> > > allow
>>> > > > other remote-related configurations to be set on a topic (i.e.
>>> > > > local.retention.*) - it just won't have an effect. Granted, I do
>>> > believe
>>> > > we
>>> > > > should restrict the policy being changed while a disablement is
>>> > ongoing.
>>> > > >
>>> > > > re: Satish and Kamal
>>> > > >
>>> > > > 104, 1 and 2 are fair asks, I will work with Doguscan to update
>>> the KIP
>>> > > > with the information!
>>> > > >
>>> > > > Best,
>>> > > > Christo
>>> > > >
>>> > > > On Thu, 28 Mar 2024 at 10:31, Doğuşcan Namal <
>>> namal.dogus...@gmail.com
>>> > >
>>> > > > wrote:
>>> > > >
>>> > > > > Hi Satish, I will try to answer as much as I can and the others
>>> could
>>> > > > chime
>>> > > > > in with further details.
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > *101. For remote.log.disable.policy=delete: Does it delete the
>>> remote
>>> > > log
>>> > > > > data immediately and the data in remote storage will not be taken
>>> > into
>>> > > > > account by any replica? That means log-start-offset is moved to
>>> the
>>> > > > earlier
>>> > > > > local-log-start-offset.*
>>> > > > > *Exactly. RemoteLogData will be deleted immediately. *
>>> > > > > *So before the deletion starts we move LogStart offset to
>>> > > > > LocalLogStartOffset to ensure that no RemoteLog will be accessed
>>> > after
>>> > > > that
>>> > > > > point.*
>>> > > > >
>>> > > > >
>>> > > > > * 102. Can we update the remote.log.disable.policy after tiered
>>> > storage
>>> > > > is
>>> > > > > disabled on a topic?*
>>> > > > >
>>> > > > > *This is a good point. I think we should not allow modifying this
>>> > > > > configuration*
>>> > > > > *because changing the policy from Deletion to Retain when there
>>> is an
>>> > > > > ongoing Deletion will result in an undefined behaviour and where
>>> we
>>> > > > retain
>>> > > > > half of the remote log and delete the other half.*
>>> > > > >
>>> > > > > * 103. Do we plan to add any metrics related to this feature?*
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > *Any recommendations?*
>>> > > > > *We may emit a gauge showing the enablement state of a topic but
>>> we
>>> > > could
>>> > > > > gather that info from the logs as well.*
>>> > > > > *The total duration for remote topic deletion could be added as
>>> well
>>> > > but
>>> > > > > this is more of a metric for the RemotePartitionRemover itself.*
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > *104. Please add configuration details about copier thread pool,
>>> > > > expiration
>>> > > > > thread pool and the migration of the existing
>>> > > > > RemoteLogManagerScheduledThreadPool.*
>>> > > > >
>>> > > > > *Will add the details.*
>>> > > > >
>>> > > > > 105. How is the behaviour with topic or partition deletion
>>> request
>>> > > > > handled when tiered storage disablement request is still being
>>> > > > > processed on a topic?
>>> > > > >
>>> > > > > *If the disablement policy is Delete then a successive topic
>>> deletion
>>> > > > > request is going to be a NOOP because RemoteLogs is already being
>>> > > > deleted.*
>>> > > > > *If the disablement policy is Retain, then we only moved the
>>> > > > LogStartOffset
>>> > > > > and didn't touch RemoteLogs anyway, so the delete topic request
>>> will
>>> > > > > result*
>>> > > > >
>>> > > > > *in the initiation of RemoteLog deletion.*
>>> > > > >
>>> > > > >
>>> > > > > On Tue, 26 Mar 2024 at 18:21, Kamal Chandraprakash <
>>> > > > > kamal.chandraprak...@gmail.com> wrote:
>>> > > > >
>>> > > > > > Hi,
>>> > > > > >
>>> > > > > > Thanks for the KIP! Overall the KIP looks good and covered
>>> most of
>>> > > the
>>> > > > > > items.
>>> > > > > >
>>> > > > > > 1. Could you explain how the brokers will handle the
>>> > > DisableRemoteTopic
>>> > > > > API
>>> > > > > > request?
>>> > > > > >
>>> > > > > > 2. Who will initiate the controller interaction sequence? Does
>>> the
>>> > > > > > controller listens for
>>> > > > > > topic config updates and initiate the disablement?
>>> > > > > >
>>> > > > > > --
>>> > > > > > Kamal
>>> > > > > >
>>> > > > > > On Tue, Mar 26, 2024 at 4:40 PM Satish Duggana <
>>> > > > satish.dugg...@gmail.com
>>> > > > > >
>>> > > > > > wrote:
>>> > > > > >
>>> > > > > > > Thanks Mehari, Divij, Christo etal for the KIP.
>>> > > > > > >
>>> > > > > > > I had an initial review of the KIP and left the below
>>> comments.
>>> > > > > > >
>>> > > > > > > 101. For remote.log.disable.policy=delete:
>>> > > > > > > Does it delete the remote log data immediately and the data
>>> in
>>> > > remote
>>> > > > > > > storage will not be taken into account by any replica? That
>>> means
>>> > > > > > > log-start-offset is moved to the earlier
>>> local-log-start-offset.
>>> > > > > > >
>>> > > > > > > 102. Can we update the remote.log.disable.policy after tiered
>>> > > storage
>>> > > > > > > is disabled on a topic?
>>> > > > > > >
>>> > > > > > > 103. Do we plan to add any metrics related to this feature?
>>> > > > > > >
>>> > > > > > > 104. Please add configuration details about copier thread
>>> pool,
>>> > > > > > > expiration thread pool and the migration of the existing
>>> > > > > > > RemoteLogManagerScheduledThreadPool.
>>> > > > > > >
>>> > > > > > > 105. How is the behaviour with topic or partition deletion
>>> > request
>>> > > > > > > handled when tiered storage disablement request is still
>>> being
>>> > > > > > > processed on a topic?
>>> > > > > > >
>>> > > > > > > ~Satish.
>>> > > > > > >
>>> > > > > > > On Mon, 25 Mar 2024 at 13:34, Doğuşcan Namal <
>>> > > > namal.dogus...@gmail.com
>>> > > > > >
>>> > > > > > > wrote:
>>> > > > > > > >
>>> > > > > > > > Hi Christo and Luke,
>>> > > > > > > >
>>> > > > > > > > I think the KRaft section of the KIP requires slight
>>> > improvement.
>>> > > > The
>>> > > > > > > metadata propagation in KRaft is handled by the RAFT layer
>>> > instead
>>> > > of
>>> > > > > > > sending Controller -> Broker RPCs. In fact, KIP-631
>>> deprecated
>>> > > these
>>> > > > > > RPCs.
>>> > > > > > > >
>>> > > > > > > > I will come up with some recommendations on how we could
>>> > improve
>>> > > > that
>>> > > > > > > one but until then, @Luke please feel free to review the KIP.
>>> > > > > > > >
>>> > > > > > > > @Satish, if we want this to make it to Kafka 3.8 I believe
>>> we
>>> > > need
>>> > > > to
>>> > > > > > > aim to get the KIP approved in the following weeks otherwise
>>> it
>>> > > will
>>> > > > > slip
>>> > > > > > > and we can not support it in Zookeeper mode.
>>> > > > > > > >
>>> > > > > > > > I also would like to better understand what is the
>>> community's
>>> > > > stand
>>> > > > > > for
>>> > > > > > > adding a new feature for Zookeeper since it is marked as
>>> > deprecated
>>> > > > > > already.
>>> > > > > > > >
>>> > > > > > > > Thanks.
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > On Mon, 18 Mar 2024 at 13:42, Christo Lolov <
>>> > > > christolo...@gmail.com>
>>> > > > > > > wrote:
>>> > > > > > > >>
>>> > > > > > > >> Heya,
>>> > > > > > > >>
>>> > > > > > > >> I do have some time to put into this, but to be honest I
>>> am
>>> > > still
>>> > > > > > after
>>> > > > > > > >> reviews of the KIP itself :)
>>> > > > > > > >>
>>> > > > > > > >> After the latest changes it ought to be detailing both a
>>> > > Zookeeper
>>> > > > > > > approach
>>> > > > > > > >> and a KRaft approach.
>>> > > > > > > >>
>>> > > > > > > >> Do you have any thoughts on how it could be improved or
>>> > should I
>>> > > > > > start a
>>> > > > > > > >> voting thread?
>>> > > > > > > >>
>>> > > > > > > >> Best,
>>> > > > > > > >> Christo
>>> > > > > > > >>
>>> > > > > > > >> On Thu, 14 Mar 2024 at 06:12, Luke Chen <
>>> show...@gmail.com>
>>> > > > wrote:
>>> > > > > > > >>
>>> > > > > > > >> > Hi Christo,
>>> > > > > > > >> >
>>> > > > > > > >> > Any update with this KIP?
>>> > > > > > > >> > If you don't have time to complete it, I can collaborate
>>> > with
>>> > > > you
>>> > > > > to
>>> > > > > > > work
>>> > > > > > > >> > on it.
>>> > > > > > > >> >
>>> > > > > > > >> > Thanks.
>>> > > > > > > >> > Luke
>>> > > > > > > >> >
>>> > > > > > > >> > On Wed, Jan 17, 2024 at 11:38 PM Satish Duggana <
>>> > > > > > > satish.dugg...@gmail.com>
>>> > > > > > > >> > wrote:
>>> > > > > > > >> >
>>> > > > > > > >> > > Hi Christo,
>>> > > > > > > >> > > Thanks for volunteering to contribute to the KIP
>>> > > discussion. I
>>> > > > > > > suggest
>>> > > > > > > >> > > considering this KIP for both ZK and KRaft as it will
>>> be
>>> > > > helpful
>>> > > > > > for
>>> > > > > > > >> > > this feature to be available in 3.8.0 running with ZK
>>> > > > clusters.
>>> > > > > > > >> > >
>>> > > > > > > >> > > Thanks,
>>> > > > > > > >> > > Satish.
>>> > > > > > > >> > >
>>> > > > > > > >> > > On Wed, 17 Jan 2024 at 19:04, Christo Lolov <
>>> > > > > > christolo...@gmail.com
>>> > > > > > > >
>>> > > > > > > >> > > wrote:
>>> > > > > > > >> > > >
>>> > > > > > > >> > > > Hello!
>>> > > > > > > >> > > >
>>> > > > > > > >> > > > I volunteer to get this KIP moving forward and
>>> > implemented
>>> > > > in
>>> > > > > > > Apache
>>> > > > > > > >> > > Kafka
>>> > > > > > > >> > > > 3.8.
>>> > > > > > > >> > > >
>>> > > > > > > >> > > > I have caught up with Mehari offline and we have
>>> agreed
>>> > > that
>>> > > > > > given
>>> > > > > > > >> > Apache
>>> > > > > > > >> > > > Kafka 4.0 being around the corner we would like to
>>> > propose
>>> > > > > this
>>> > > > > > > feature
>>> > > > > > > >> > > > only for KRaft clusters.
>>> > > > > > > >> > > >
>>> > > > > > > >> > > > Any and all reviews and comments are welcome!
>>> > > > > > > >> > > >
>>> > > > > > > >> > > > Best,
>>> > > > > > > >> > > > Christo
>>> > > > > > > >> > > >
>>> > > > > > > >> > > > On Tue, 9 Jan 2024 at 09:44, Doğuşcan Namal <
>>> > > > > > > namal.dogus...@gmail.com>
>>> > > > > > > >> > > > wrote:
>>> > > > > > > >> > > >
>>> > > > > > > >> > > > > Hi everyone, any progress on the status of this
>>> KIP?
>>> > > > Overall
>>> > > > > > > looks
>>> > > > > > > >> > > good to
>>> > > > > > > >> > > > > me but I wonder whether we still need to support
>>> it
>>> > for
>>> > > > > > > Zookeeper
>>> > > > > > > >> > mode
>>> > > > > > > >> > > > > given that it will be deprecated in the next 3
>>> months.
>>> > > > > > > >> > > > >
>>> > > > > > > >> > > > > On 2023/07/21 20:16:46 "Beyene, Mehari" wrote:
>>> > > > > > > >> > > > > > Hi everyone,
>>> > > > > > > >> > > > > > I would like to start a discussion on KIP-950:
>>> > Tiered
>>> > > > > > Storage
>>> > > > > > > >> > > Disablement
>>> > > > > > > >> > > > > (
>>> > > > > > > >> > > > >
>>> > > > > > > >> > > > >
>>> > > > > > > >> > >
>>> > > > > > > >> >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement
>>> > > > > > > >> > > > > ).
>>> > > > > > > >> > > > > >
>>> > > > > > > >> > > > > > This KIP proposes adding the ability to disable
>>> and
>>> > > > > > re-enable
>>> > > > > > > >> > tiered
>>> > > > > > > >> > > > > storage on a topic.
>>> > > > > > > >> > > > > >
>>> > > > > > > >> > > > > > Thanks,
>>> > > > > > > >> > > > > > Mehari
>>> > > > > > > >> > > > > >
>>> > > > > > > >> > > > >
>>> > > > > > > >> > >
>>> > > > > > > >> >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>

Reply via email to