Christo Lolov created KAFKA-15267: ------------------------------------- Summary: Cluster-wide disablement of Tiered Storage Key: KAFKA-15267 URL: https://issues.apache.org/jira/browse/KAFKA-15267 Project: Kafka Issue Type: Sub-task Reporter: Christo Lolov Assignee: Christo Lolov
h2. Summary KIP-405 defines the configuration {{remote.log.storage.system.enable}} which controls whether all resources needed for Tiered Storage to function are instantiated properly in Kafka. However, the interaction between remote data and Kafka if that configuration is set to false while there are still topics with {{{}remote.storage.enable is undefined{}}}. {color:#ff8b00}*We would like to give customers the ability to switch off Tiered Storage on a cluster level and as such would need to define the behaviour.*{color} {{remote.log.storage.system.enable}} is a read-only configuration. This means that it can only be changed by *modifying the server.properties* and restarting brokers. As such, the {*}validity of values contained in it is only checked at broker startup{*}. This JIRA proposes a few behaviours and a recommendation on a way forward. h2. Option 1: Change nothing Pros: * No operation. Cons: * We do not solve the problem of moving back to older (or newer) Kafka versions not supporting TS. h2. Option 2: Remove the configuration, enable Tiered Storage on a cluster level and do not allow it to be disabled Always instantiate all resources for tiered storage. If no special ones are selected use the default ones which come with Kafka. Pros: * We solve the problem for moving between versions not allowing TS to be disabled. Cons: * We do not solve the problem of moving back to older (or newer) Kafka versions not supporting TS. * We haven’t quantified how much computer resources (CPU, memory) idle TS components occupy. * TS is a feature not required for running Kafka. As such, while it is still under development we shouldn’t put it on the critical path of starting a broker. In this way, a stray memory leak won’t impact anything on the critical path of a broker. * We are potentially swapping one problem for another. How does TS behave if one decides to swap the TS plugin classes when data has already been written? h2. Option 3: Hide topics with tiering enabled Customers cannot interact with topics which have tiering enabled. They cannot create new topics with the same names. Retention (and compaction?) do not take effect on files already in local storage. Pros: * We do not force data-deletion. Cons: * This will be quite involved - the controller will need to know when a broker’s server.properties have been altered; the broker will need to not proceed to delete logs it is not the leader or follower for. h2. {color:#e6e6e6}Option 4: Do not start the broker if there are topics with tiering enabled{color} - Recommended This option has 2 different sub-options. The first one is that TS cannot be disabled on cluster-level if there are *any* tiering topics - in other words all tiered topics need to be deleted. The second one is that TS cannot be disabled on a cluster-level if there are *any* topics with *tiering enabled* - they can have tiering disabled, but with a retention policy set to delete or retain (as per [KIP-950|https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement]). A topic can have tiering disabled and remain on the cluster as long as there is no *remote* data when TS is disabled cluster-wide. Pros: * We force the customer to be very explicit in disabling tiering of topics prior to disabling TS on the whole cluster. Cons: * You have to make certain that all data in remote is deleted (just a disablement of tired topic is not enough). How do you determine whether all remote has expired if policy is retain? If retain policy in KIP-950 knows that there is data in remote then this should also be able to figure it out. The common denominator is that there needs to be no *remote* data at the point of disabling TS. As such, the most straightforward option is to refuse to start brokers if there are topics with the {{remote.storage.enabled}} present. This in essence requires customers to clean any tiered topics before switching off TS, which is a fair ask. Should we wish to revise this later it should be possible. h2. Option 5: Make Kafka forget about all remote information Pros: * Clean cut Cons: * Data is lost the moment TS is disabled regardless of whether it is reenabled later on, which might not be the behaviour expected by customers. -- This message was sent by Atlassian Jira (v8.20.10#820010)