Christo Lolov created KAFKA-15267:
-------------------------------------

             Summary: Cluster-wide disablement of Tiered Storage
                 Key: KAFKA-15267
                 URL: https://issues.apache.org/jira/browse/KAFKA-15267
             Project: Kafka
          Issue Type: Sub-task
            Reporter: Christo Lolov
            Assignee: Christo Lolov


h2. Summary

KIP-405 defines the configuration {{remote.log.storage.system.enable}} which 
controls whether all resources needed for Tiered Storage to function are 
instantiated properly in Kafka. However, the interaction between remote data 
and Kafka if that configuration is set to false while there are still topics 
with {{{}remote.storage.enable is undefined{}}}. {color:#ff8b00}*We would like 
to give customers the ability to switch off Tiered Storage on a cluster level 
and as such would need to define the behaviour.*{color}

{{remote.log.storage.system.enable}} is a read-only configuration. This means 
that it can only be changed by *modifying the server.properties* and restarting 
brokers. As such, the {*}validity of values contained in it is only checked at 
broker startup{*}.

This JIRA proposes a few behaviours and a recommendation on a way forward.
h2. Option 1: Change nothing

Pros:
 * No operation.

Cons:
 * We do not solve the problem of moving back to older (or newer) Kafka 
versions not supporting TS.

h2. Option 2: Remove the configuration, enable Tiered Storage on a cluster 
level and do not allow it to be disabled

Always instantiate all resources for tiered storage. If no special ones are 
selected use the default ones which come with Kafka.

Pros:
 * We solve the problem for moving between versions not allowing TS to be 
disabled.

Cons:
 * We do not solve the problem of moving back to older (or newer) Kafka 
versions not supporting TS.
 * We haven’t quantified how much computer resources (CPU, memory) idle TS 
components occupy.
 * TS is a feature not required for running Kafka. As such, while it is still 
under development we shouldn’t put it on the critical path of starting a 
broker. In this way, a stray memory leak won’t impact anything on the critical 
path of a broker.
 * We are potentially swapping one problem for another. How does TS behave if 
one decides to swap the TS plugin classes when data has already been written?

h2. Option 3: Hide topics with tiering enabled

Customers cannot interact with topics which have tiering enabled. They cannot 
create new topics with the same names. Retention (and compaction?) do not take 
effect on files already in local storage.

Pros:
 * We do not force data-deletion.

Cons:
 * This will be quite involved - the controller will need to know when a 
broker’s server.properties have been altered; the broker will need to not 
proceed to delete logs it is not the leader or follower for.

h2. {color:#e6e6e6}Option 4: Do not start the broker if there are topics with 
tiering enabled{color} - Recommended

This option has 2 different sub-options. The first one is that TS cannot be 
disabled on cluster-level if there are *any* tiering topics - in other words 
all tiered topics need to be deleted. The second one is that TS cannot be 
disabled on a cluster-level if there are *any* topics with *tiering enabled* - 
they can have tiering disabled, but with a retention policy set to delete or 
retain (as per 
[KIP-950|https://cwiki.apache.org/confluence/display/KAFKA/KIP-950%3A++Tiered+Storage+Disablement]).
 A topic can have tiering disabled and remain on the cluster as long as there 
is no *remote* data when TS is disabled cluster-wide.

Pros:
 * We force the customer to be very explicit in disabling tiering of topics 
prior to disabling TS on the whole cluster.

Cons:
 * You have to make certain that all data in remote is deleted (just a 
disablement of tired topic is not enough). How do you determine whether all 
remote has expired if policy is retain? If retain policy in KIP-950 knows that 
there is data in remote then this should also be able to figure it out.

The common denominator is that there needs to be no *remote* data at the point 
of disabling TS. As such, the most straightforward option is to refuse to start 
brokers if there are topics with the {{remote.storage.enabled}} present. This 
in essence requires customers to clean any tiered topics before switching off 
TS, which is a fair ask. Should we wish to revise this later it should be 
possible.
h2. Option 5: Make Kafka forget about all remote information

Pros:
 * Clean cut

Cons:
 * Data is lost the moment TS is disabled regardless of whether it is reenabled 
later on, which might not be the behaviour expected by customers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to