Re: [DISCUSS] KIP-1150 Diskless Topics

Ivan Yurchenko Thu, 02 Oct 2025 05:12:03 -0700

Apologies, it seems the images didn't attach...
There were only two, I'm attaching them to this message.
Sorry for the inconvenience!


- Ivan

On Thu, Oct 2, 2025, at 14:06, Ivan Yurchenko wrote:
> Hi dear Kafka community,
> 
> In the initial Diskless proposal, we proposed to have a separate component, 
> batch/diskless coordinator, whose role would be to centrally manage the batch 
> and WAL file metadata for diskless topics. This component drew many 
> reasonable comments from the community about how it would support various 
> Kafka features (transactions, queues) and its scalability. While we believe 
> we have good answers to all the expressed concerns, we took a step back and 
> looked at the problem from a different perspective.
> 
> We would like to propose an alternative Diskless design *without a 
> centralized coordinator*. We believe this approach has potential and propose 
> to discuss it as it may be more appealing to the community.
> 
> Let us explain the idea. Most of the complications with the original Diskless 
> approach come from one necessary architecture change: globalizing the local 
> state of partition leader in the batch coordinator. This causes deviations to 
> the established workflows in various features like produce idempotence and 
> transactions, queues, retention, etc. These deviations need to be carefully 
> considered, designed, and later implemented and tested. In the new approach 
> we want to avoid this by making partition leaders again responsible for 
> managing their partitions, even in diskless topics.
> 
> In classic Kafka topics, batch data and metadata are blended together in the 
> one partition log. The crux of the Diskless idea is to decouple them and move 
> data to the remote storage, while keeping metadata somewhere else. Using the 
> central batch coordinator for managing batch metadata is one way, but not the 
> only.
> 
> Let’s now think about managing metadata for each user partition 
> independently. Generally partitions are independent and don’t share anything 
> apart from that their data are mixed in WAL files. If we figure out how to 
> commit and later delete WAL files safely, we will achieve the necessary 
> autonomy that allows us to get rid of the central batch coordinator. Instead, 
> *each diskless user partition will be managed by its leader*, as in classic 
> Kafka topics. Also like in classic topics, the leader uses the partition log 
> as the way to persist batch metadata, i.e. the regular batch header + the 
> information about how to find this batch on remote storage. In contrast to 
> classic topics, batch data is in remote storage. 
> 
> For clarity, let’s compare the three designs:
> • Classic topics:
>    • Data and metadata are co-located in the partition log.
>    • The partition log content: [Batch header (metadata)|Batch data].
>    • The partition log is replicated to the followers.
>    • The replicas and leader have local state built from metadata.
> • Original Diskless:
>    • Metadata is in the batch coordinator, data is on remote storage.
>    • The partition state is global in the batch coordinator.
> • New Diskless:
>    • Metadata is in the partition log, data is on remote storage.
>    • Partition log content: [Batch header (metadata)|Batch coordinates on 
> remote storage].
>    • The partition log is replicated to the followers.
>    • The replicas and leader have local state built from metadata.
> 
> Let’s consider the produce path. Here’s the reminder of the original Diskless 
> design:
> 
> 
> The new approach could be depicted as the following:
> 
> 
> As you can see, the main difference is that now instead of a single commit 
> request to the batch coordinator, we send multiple parallel commit requests 
> to all the leaders of each partition involved in the WAL file. Each of them 
> will commit its batches independently, without coordinating with other 
> leaders and any other components. Batch data is addressed by the WAL file 
> name, the byte offset and size, which allows partitions to know nothing about 
> other partitions to access their data in shared WAL files.
> 
> The number of partitions involved in a single WAL file may be quite large, 
> e.g. a hundred. A hundred network requests to commit one WAL file is very 
> impractical. However, there are ways to reduce this number:
> 1. Partition leaders are located on brokers. Requests to leaders on one 
> broker could be grouped together into a single physical network request 
> (resembling the normal Produce request that may carry batches for many 
> partitions inside). This will cap the number of network requests to the 
> number of brokers in the cluster.
> 2. If we craft the cluster metadata to make producers send their requests to 
> the right brokers (with respect to AZs), we may achieve the higher 
> concentration of logical commit requests in physical network requests 
> reducing the number of the latter ones even further, ideally to one.
> 
> Obviously, out of multiple commit requests some may fail or time out for a 
> variety of reasons. This is fine. Some producers will receive totally or 
> partially failed responses to their Produce requests, similar to what they 
> would have received when appending to a classic topic fails or times out. If 
> a partition experiences problems, other partitions will not be affected 
> (again, like in classic topics). Of course, the uncommitted data will be 
> garbage in WAL files. But WAL files are short-lived (batches are constantly 
> assembled into segments and offloaded to tiered storage), so this garbage 
> will be eventually deleted.
> 
> For safely deleting WAL files we now need to centrally manage them, as this 
> is the only state and logic that spans multiple partitions. On the diagram, 
> you can see another commit request called “Commit file (best effort)” going 
> to the WAL File Manager. This manager will be responsible for the following:
> 1. Collecting (by requests from brokers) and persisting information about 
> committed WAL files.
> 2. To handle potential failures in file information delivery, it will be 
> doing prefix scan on the remote storage periodically to find and register 
> unknown files. The period of this scan will be configurable and ideally 
> should be quite long.
> 3. Checking with the relevant partition leaders (after a grace period) if 
> they still have batches in a particular file.
> 4. Physically deleting files when they aren’t anymore referred to by any 
> partition.
> 
> This new design offers the following advantages:
> 1. It simplifies the implementation of many Kafka features such as 
> idempotence, transactions, queues, tiered storage, retention. Now we don’t 
> need to abstract away and reuse the code from partition leaders in the batch 
> coordinator. Instead, we will literally use the same code paths in leaders, 
> with little adaptation. Workflows from classic topics mostly remain unchanged.
> For example, it seems that 
> ReplicaManager.maybeSendPartitionsToTransactionCoordinator  and 
> KafkaApis.handleWriteTxnMarkersRequest used for transaction support on the 
> partition leader side could be used for diskless topics with little 
> adaptation. ProducerStateManager, needed for both idempotent produce and 
> transactions, would be reused.
> Another example is share groups support, where the share partition leader, 
> being co-located with the partition leader, would execute the same logic for 
> both diskless and classic topics.
> 2. It returns to the familiar partition-based scaling model, where partitions 
> are independent.
> 3. It makes the operation and failure patterns closer to the familiar ones 
> from classic topics.
> 4. It opens a straightforward path to seamless switching the topics modes 
> between diskless and classic.
> 
> The rest of the things remain unchanged compared to the previous Diskless 
> design (after all previous discussions). Such things as local segment 
> materialization by replicas, the consume path, tiered storage integration, 
> etc.
> 
> If the community finds this design more suitable, we will update the KIP(s) 
> accordingly and continue working on it. Please let us know what you think.
> 
> Best regards,
> Ivan and Diskless team
> 
> On Mon, Sep 29, 2025, at 15:06, Ivan Yurchenko wrote:
> > Hi Justine,
> > 
> > Yes, you're right. We need to track the aborted transactions for in the 
> > diskless coordinator for as long as the corresponding offsets are there. 
> > With the tiered storage unification Greg mentioned earlier, this will be 
> > finite time even for infinite data retention.
> > 
> > Best,
> > Ivan
> > 
> > On Wed, Sep 17, 2025, at 19:41, Justine Olshan wrote:
> > > Hey Ivan,
> > > 
> > > Thanks for the response. I think most of what you said made sense, but I
> > > did have some questions about this part:
> > > 
> > > > As we understand this, the partition leader in classic topics forgets
> > > about a transaction once it’s replicated (HWM overpasses it). The
> > > transaction coordinator acts like the main guardian, allowing partition
> > > leaders to do this safely. Please correct me if this is wrong. We think
> > > about relying on this with the batch coordinator and delete the 
> > > information
> > > about a transaction once it’s finished (as there’s no replication and HWM
> > > advances immediately).
> > > 
> > > I didn't quite understand this. In classic topics, we have maps for 
> > > ongoing
> > > transactions which remove state when the transaction is completed and an
> > > aborted transactions index which is retained for much longer. Once the
> > > transaction is completed, the coordinator is no longer involved in
> > > maintaining this partition side state, and it is subject to compaction 
> > > etc.
> > > Looking back at the outline provided above, I didn't see much about the
> > > fetch path, so maybe that could be expanded a bit further. I saw the
> > > following in a response:
> > > > When the broker constructs a fully valid local segment, all the 
> > > > necessary
> > > control batches will be inserted and indices, including the transaction
> > > index will be built to serve FetchRequests exactly as they are today.
> > > 
> > > Based on this, it seems like we need to retain the information about
> > > aborted txns for longer.
> > > 
> > > Thanks,
> > > Justine
> > > 
> > > On Mon, Sep 15, 2025 at 9:43 AM Ivan Yurchenko <[email protected]> wrote:
> > > 
> > > > Hi Justine and all,
> > > >
> > > > Thank you for your questions!
> > > >
> > > > > JO 1. >Since a transaction could be uniquely identified with producer 
> > > > > ID
> > > > > and epoch, the positive result of this check could be cached locally
> > > > > Are we saying that only new transaction version 2 transactions can be
> > > > used
> > > > > here? If not, we can't uniquely identify transactions with producer 
> > > > > id +
> > > > > epoch
> > > >
> > > > You’re right that we (probably unintentionally) focused only on version 
> > > > 2.
> > > > We can either limit the support to version 2 or consider using some
> > > > surrogates to support version 1.
> > > >
> > > > > JO 2. >The batch coordinator does the final transactional checks of 
> > > > > the
> > > > > batches. This procedure would output the same errors like the 
> > > > > partition
> > > > > leader in classic topics would do.
> > > > > Can you expand on what these checks are? Would you be checking if the
> > > > > transaction was still ongoing for example?* *
> > > >
> > > > Yes, the producer epoch, that the transaction is ongoing, and of course
> > > > the normal idempotence checks. What the partition leader in the classic
> > > > topics does before appending a batch to the local log (e.g. in
> > > > UnifiedLog.maybeStartTransactionVerification and
> > > > UnifiedLog.analyzeAndValidateProducerState). In Diskless, we 
> > > > unfortunately
> > > > cannot do these checks before appending the data to the WAL segment and
> > > > uploading it, but we can “tombstone” these batches in the batch 
> > > > coordinator
> > > > during the final commit.
> > > >
> > > > > Is there state about ongoing
> > > > > transactions in the batch coordinator? I see some other state 
> > > > > mentioned
> > > > in
> > > > > the End transaction section, but it's not super clear what state is
> > > > stored
> > > > > and when it is stored.
> > > >
> > > > Right, this should have been more explicit. As the partition leader 
> > > > tracks
> > > > ongoing transactions for classic topics, the batch coordinator has to as
> > > > well. So when a transaction starts and ends, the transaction coordinator
> > > > must inform the batch coordinator about this.
> > > >
> > > > > JO 3. I didn't see anything about maintaining LSO -- perhaps that 
> > > > > would
> > > > be
> > > > > stored in the batch coordinator?
> > > >
> > > > Yes. This could be deduced from the committed batches and other
> > > > information, but for the sake of performance we’d better store it
> > > > explicitly.
> > > >
> > > > > JO 4. Are there any thoughts about how long transactional state is
> > > > > maintained in the batch coordinator and how it will be cleaned up?
> > > >
> > > > As we understand this, the partition leader in classic topics forgets
> > > > about a transaction once it’s replicated (HWM overpasses it). The
> > > > transaction coordinator acts like the main guardian, allowing partition
> > > > leaders to do this safely. Please correct me if this is wrong. We think
> > > > about relying on this with the batch coordinator and delete the 
> > > > information
> > > > about a transaction once it’s finished (as there’s no replication and 
> > > > HWM
> > > > advances immediately).
> > > >
> > > > Best,
> > > > Ivan
> > > >
> > > > On Tue, Sep 9, 2025, at 00:38, Justine Olshan wrote:
> > > > > Hey folks,
> > > > >
> > > > > Excited to see some updates related to transactions!
> > > > >
> > > > > I had a few questions.
> > > > >
> > > > > JO 1. >Since a transaction could be uniquely identified with producer 
> > > > > ID
> > > > > and epoch, the positive result of this check could be cached locally
> > > > > Are we saying that only new transaction version 2 transactions can be
> > > > used
> > > > > here? If not, we can't uniquely identify transactions with producer 
> > > > > id +
> > > > > epoch
> > > > >
> > > > > JO 2. >The batch coordinator does the final transactional checks of 
> > > > > the
> > > > > batches. This procedure would output the same errors like the 
> > > > > partition
> > > > > leader in classic topics would do.
> > > > > Can you expand on what these checks are? Would you be checking if the
> > > > > transaction was still ongoing for example? Is there state about 
> > > > > ongoing
> > > > > transactions in the batch coordinator? I see some other state 
> > > > > mentioned
> > > > in
> > > > > the End transaction section, but it's not super clear what state is
> > > > stored
> > > > > and when it is stored.
> > > > >
> > > > > JO 3. I didn't see anything about maintaining LSO -- perhaps that 
> > > > > would
> > > > be
> > > > > stored in the batch coordinator?
> > > > >
> > > > > JO 4. Are there any thoughts about how long transactional state is
> > > > > maintained in the batch coordinator and how it will be cleaned up?
> > > > >
> > > > > On Mon, Sep 8, 2025 at 10:38 AM Jun Rao <[email protected]>
> > > > wrote:
> > > > >
> > > > > > Hi, Greg and Ivan,
> > > > > >
> > > > > > Thanks for the update. A few comments.
> > > > > >
> > > > > > JR 10. "Consumer fetches are now served from local segments, making
> > > > use of
> > > > > > the
> > > > > > indexes, page cache, request purgatory, and zero-copy functionality
> > > > already
> > > > > > built into classic topics."
> > > > > > JR 10.1 Does the broker build the producer state for each partition 
> > > > > > in
> > > > > > diskless topics?
> > > > > > JR 10.2 For transactional data, the consumer fetches need to know
> > > > aborted
> > > > > > records. How is that achieved?
> > > > > >
> > > > > > JR 11. "The batch coordinator saves that the transaction is finished
> > > > and
> > > > > > also inserts the control batches in the corresponding logs of the
> > > > involved
> > > > > > Diskless topics. This happens only on the metadata level, no actual
> > > > control
> > > > > > batches are written to any file. "
> > > > > > A fetch response could include multiple transactional batches. How
> > > > does the
> > > > > > broker obtain the information about the ending control batch for 
> > > > > > each
> > > > > > batch? Does that mean that a fetch response needs to be built by
> > > > > > stitching record batches and generated control batches together?
> > > > > >
> > > > > > JR 12. Queues: Is there still a share partition leader that all
> > > > consumers
> > > > > > are routed to?
> > > > > >
> > > > > > JR 13. "Should the KIPs be modified to include this or it's too
> > > > > > implementation-focused?" It would be useful to include enough 
> > > > > > details
> > > > to
> > > > > > understand correctness and performance impact.
> > > > > >
> > > > > > HC5. Henry has a valid point. Requests from a given producer 
> > > > > > contain a
> > > > > > sequence number, which is ordered. If a producer sends every Produce
> > > > > > request to an arbitrary broker, those requests could reach the batch
> > > > > > coordinator in different order and lead to rejection of the produce
> > > > > > requests.
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Thu, Sep 4, 2025 at 12:00 AM Ivan Yurchenko <[email protected]> 
> > > > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > We have also thought in a bit more details about transactions and
> > > > queues,
> > > > > > > here's the plan.
> > > > > > >
> > > > > > > *Transactions*
> > > > > > >
> > > > > > > The support for transactions in *classic topics* is based on 
> > > > > > > precise
> > > > > > > interactions between three actors: clients (mostly producers, but
> > > > also
> > > > > > > consumers), brokers (ReplicaManager and other classes), and
> > > > transaction
> > > > > > > coordinators. Brokers also run partition leaders with their local
> > > > state
> > > > > > > (ProducerStateManager and others).
> > > > > > >
> > > > > > > The high level (some details skipped) workflow is the following.
> > > > When a
> > > > > > > transactional Produce request is received by the broker:
> > > > > > > 1. For each partition, the partition leader checks if a non-empty
> > > > > > > transaction is running for this partition. This is done using its
> > > > local
> > > > > > > state derived from the log metadata (ProducerStateManager,
> > > > > > > VerificationStateEntry, VerificationGuard).
> > > > > > > 2. The transaction coordinator is informed about all the 
> > > > > > > partitions
> > > > that
> > > > > > > aren’t part of the transaction to include them.
> > > > > > > 3. The partition leaders do additional transactional checks.
> > > > > > > 4. The partition leaders append the transactional data to their 
> > > > > > > logs
> > > > and
> > > > > > > update some of their state (for example, log the fact that the
> > > > > > transaction
> > > > > > > is running for the partition and its first offset).
> > > > > > >
> > > > > > > When the transaction is committed or aborted:
> > > > > > > 1. The producer contacts the transaction coordinator directly with
> > > > > > > EndTxnRequest.
> > > > > > > 2. The transaction coordinator writes PREPARE_COMMIT or
> > > > PREPARE_ABORT to
> > > > > > > its log and responds to the producer.
> > > > > > > 3. The transaction coordinator sends WriteTxnMarkersRequest to the
> > > > > > leaders
> > > > > > > of the involved partitions.
> > > > > > > 4. The partition leaders write the transaction markers to their 
> > > > > > > logs
> > > > and
> > > > > > > respond to the coordinator.
> > > > > > > 5. The coordinator writes the final transaction state
> > > > COMPLETE_COMMIT or
> > > > > > > COMPLETE_ABORT.
> > > > > > >
> > > > > > > In classic topics, partitions have leaders and lots of important
> > > > state
> > > > > > > necessary for supporting this workflow is local. The main 
> > > > > > > challenge
> > > > in
> > > > > > > mapping this to Diskless comes from the fact there are no 
> > > > > > > partition
> > > > > > > leaders, so the corresponding pieces of state need to be 
> > > > > > > globalized
> > > > in
> > > > > > the
> > > > > > > batch coordinator. We are already doing this to support idempotent
> > > > > > produce.
> > > > > > >
> > > > > > > The high level workflow for *diskless topics* would look very
> > > > similar:
> > > > > > > 1. For each partition, the broker checks if a non-empty 
> > > > > > > transaction
> > > > is
> > > > > > > running for this partition. In contrast to classic topics, this is
> > > > > > checked
> > > > > > > against the batch coordinator with a single RPC. Since a 
> > > > > > > transaction
> > > > > > could
> > > > > > > be uniquely identified with producer ID and epoch, the positive
> > > > result of
> > > > > > > this check could be cached locally (for the double configured
> > > > duration
> > > > > > of a
> > > > > > > transaction, for example).
> > > > > > > 2. The same: The transaction coordinator is informed about all the
> > > > > > > partitions that aren’t part of the transaction to include them.
> > > > > > > 3. No transactional checks are done on the broker side.
> > > > > > > 4. The broker appends the transactional data to the current shared
> > > > WAL
> > > > > > > segment. It doesn’t update any transaction-related state for 
> > > > > > > Diskless
> > > > > > > topics, because it doesn’t have any.
> > > > > > > 5. The WAL segment is committed to the batch coordinator like in 
> > > > > > > the
> > > > > > > normal produce flow.
> > > > > > > 6. The batch coordinator does the final transactional checks of 
> > > > > > > the
> > > > > > > batches. This procedure would output the same errors like the
> > > > partition
> > > > > > > leader in classic topics would do. I.e. some batches could be
> > > > rejected.
> > > > > > > This means, there will potentially be garbage in the WAL segment
> > > > file in
> > > > > > > case of transactional errors. This is preferable to doing more
> > > > network
> > > > > > > round trips, especially considering the WAL segments will be
> > > > relatively
> > > > > > > short-living (see the Greg's update above).
> > > > > > >
> > > > > > > When the transaction is committed or aborted:
> > > > > > > 1. The producer contacts the transaction coordinator directly with
> > > > > > > EndTxnRequest.
> > > > > > > 2. The transaction coordinator writes PREPARE_COMMIT or
> > > > PREPARE_ABORT to
> > > > > > > its log and responds to the producer.
> > > > > > > 3. *[NEW]* The transaction coordinator informs the batch 
> > > > > > > coordinator
> > > > that
> > > > > > > the transaction is finished.
> > > > > > > 4. *[NEW]* The batch coordinator saves that the transaction is
> > > > finished
> > > > > > > and also inserts the control batches in the corresponding logs of 
> > > > > > > the
> > > > > > > involved Diskless topics. This happens only on the metadata 
> > > > > > > level, no
> > > > > > > actual control batches are written to any file. They will be
> > > > dynamically
> > > > > > > created on Fetch and other read operations. We could technically
> > > > write
> > > > > > > these control batches for real, but this would mean extra produce
> > > > > > latency,
> > > > > > > so it's better just to mark them in the batch coordinator and save
> > > > these
> > > > > > > milliseconds.
> > > > > > > 5. The transaction coordinator sends WriteTxnMarkersRequest to the
> > > > > > leaders
> > > > > > > of the involved partitions. – Now only to classic topics now.
> > > > > > > 6. The partition leaders of classic topics write the transaction
> > > > markers
> > > > > > > to their logs and respond to the coordinator.
> > > > > > > 7. The coordinator writes the final transaction state
> > > > COMPLETE_COMMIT or
> > > > > > > COMPLETE_ABORT.
> > > > > > >
> > > > > > > Compared to the non-transactional produce flow, we get:
> > > > > > > 1. An extra network round trip between brokers and the batch
> > > > coordinator
> > > > > > > when a new partition appear in the transaction. To mitigate the
> > > > impact of
> > > > > > > them:
> > > > > > >   - The results will be cached.
> > > > > > >   - The calls for multiple partitions in one Produce request will 
> > > > > > > be
> > > > > > > grouped.
> > > > > > >   - The batch coordinator should be optimized for fast response to
> > > > these
> > > > > > > RPCs.
> > > > > > >   - The fact that a single producer normally will communicate 
> > > > > > > with a
> > > > > > > single broker for the duration of the transaction further reduces 
> > > > > > > the
> > > > > > > expected number of round trips.
> > > > > > > 2. An extra round trip between the transaction coordinator and 
> > > > > > > batch
> > > > > > > coordinator when a transaction is finished.
> > > > > > >
> > > > > > > With this proposal, transactions will also be able to span both
> > > > classic
> > > > > > > and Diskless topics.
> > > > > > >
> > > > > > > *Queues*
> > > > > > >
> > > > > > > The share group coordination and management is a side job that
> > > > doesn't
> > > > > > > interfere with the topic itself (leadership, replicas, physical
> > > > storage
> > > > > > of
> > > > > > > records, etc.) and non-queue producers and consumers (Fetch and
> > > > Produce
> > > > > > > RPCs, consumer group-related RPCs are not affected.) We don't see 
> > > > > > > any
> > > > > > > reason why we can't make Diskless topics compatible with share
> > > > groups the
> > > > > > > same way as classic topics are. Even on the code level, we don't
> > > > expect
> > > > > > any
> > > > > > > serious refactoring: the same reading routines are used that are
> > > > used for
> > > > > > > fetching (e.g. ReplicaManager.readFromLog).
> > > > > > >
> > > > > > >
> > > > > > > Should the KIPs be modified to include this or it's too
> > > > > > > implementation-focused?
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Ivan
> > > > > > >
> > > > > > > On Wed, Sep 3, 2025, at 21:59, Greg Harris wrote:
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > Thank you all for your questions and design input on KIP-1150.
> > > > > > > >
> > > > > > > > We have just updated KIP-1150 and KIP-1163 with a new design. To
> > > > > > > summarize
> > > > > > > > the changes:
> > > > > > > >
> > > > > > > > 1. The design prioritizes integrating with the existing KIP-405
> > > > Tiered
> > > > > > > > Storage interfaces, permitting data produced to a Diskless topic
> > > > to be
> > > > > > > > moved to tiered storage.
> > > > > > > > This lowers the scalability requirements for the Batch 
> > > > > > > > Coordinator
> > > > > > > > component, and allows Diskless to compose with Tiered Storage
> > > > plugin
> > > > > > > > features such as encryption and alternative data formats.
> > > > > > > >
> > > > > > > > 2. Consumer fetches are now served from local segments, making 
> > > > > > > > use
> > > > of
> > > > > > the
> > > > > > > > indexes, page cache, request purgatory, and zero-copy 
> > > > > > > > functionality
> > > > > > > already
> > > > > > > > built into classic topics.
> > > > > > > > However, local segments are now considered cache elements, do 
> > > > > > > > not
> > > > need
> > > > > > to
> > > > > > > > be durably stored, and can be built without contacting any other
> > > > > > > replicas.
> > > > > > > >
> > > > > > > > 3. The design has been simplified substantially, by removing the
> > > > > > previous
> > > > > > > > Diskless consume flow, distributed cache component, and "object
> > > > > > > > compaction/merging" step.
> > > > > > > >
> > > > > > > > The design maintains leaderless produces as enabled by the Batch
> > > > > > > > Coordinator, and the same latency profiles as the earlier 
> > > > > > > > design,
> > > > while
> > > > > > > > being simpler and integrating better into the existing 
> > > > > > > > ecosystem.
> > > > > > > >
> > > > > > > > Thanks, and we are eager to hear your feedback on the new 
> > > > > > > > design.
> > > > > > > > Greg Harris
> > > > > > > >
> > > > > > > > On Mon, Jul 21, 2025 at 3:30 PM Jun Rao 
> > > > > > > > <[email protected]>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi, Jan,
> > > > > > > > >
> > > > > > > > > For me, the main gap of KIP-1150 is the support of all 
> > > > > > > > > existing
> > > > > > client
> > > > > > > > > APIs. Currently, there is no design for supporting APIs like
> > > > > > > transactions
> > > > > > > > > and queues.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Jun
> > > > > > > > >
> > > > > > > > > On Mon, Jul 21, 2025 at 3:53 AM Jan Siekierski
> > > > > > > > > <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > > Would it be a good time to ask for the current status of 
> > > > > > > > > > this
> > > > KIP?
> > > > > > I
> > > > > > > > > > haven't seen much activity here for the past 2 months, the
> > > > vote got
> > > > > > > > > vetoed
> > > > > > > > > > but I think the pending questions have been answered since
> > > > then.
> > > > > > > KIP-1183
> > > > > > > > > > (AutoMQ's proposal) also didn't have any activity since May.
> > > > > > > > > >
> > > > > > > > > > In my eyes KIP-1150 and KIP-1183 are two real choices that 
> > > > > > > > > > can
> > > > be
> > > > > > > > > > made, with a coordinator-based approach being by far the
> > > > dominant
> > > > > > one
> > > > > > > > > when
> > > > > > > > > > it comes to market adoption - but all these are standalone
> > > > > > products.
> > > > > > > > > >
> > > > > > > > > > I'm a big fan of both approaches, but would hate to see a
> > > > stall. So
> > > > > > > the
> > > > > > > > > > question is: can we get an update?
> > > > > > > > > >
> > > > > > > > > > Maybe it's time to start another vote? Colin McCabe - have 
> > > > > > > > > > your
> > > > > > > questions
> > > > > > > > > > been answered? If not, is there anything I can do to help? 
> > > > > > > > > > I'm
> > > > > > deeply
> > > > > > > > > > familiar with both architectures and have written about 
> > > > > > > > > > both?
> > > > > > > > > >
> > > > > > > > > > Kind regards,
> > > > > > > > > > Jan
> > > > > > > > > >
> > > > > > > > > > On Tue, Jun 24, 2025 at 10:42 AM Stanislav Kozlovski <
> > > > > > > > > > [email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > > I have some nits - it may be useful to
> > > > > > > > > > >
> > > > > > > > > > > a) group all the KIP email threads in the main one (just a
> > > > bunch
> > > > > > of
> > > > > > > > > links
> > > > > > > > > > > to everything)
> > > > > > > > > > > b) create the email threads
> > > > > > > > > > >
> > > > > > > > > > > It's a bit hard to track it all - for example, I was
> > > > searching
> > > > > > for
> > > > > > > a
> > > > > > > > > > > discuss thread for KIP-1165 for a while; As far as I can
> > > > tell, it
> > > > > > > > > doesn't
> > > > > > > > > > > exist yet.
> > > > > > > > > > >
> > > > > > > > > > > Since the KIPs are published (by virtue of having the root
> > > > KIP be
> > > > > > > > > > > published, having a DISCUSS thread and links to sub-KIPs
> > > > where
> > > > > > were
> > > > > > > > > aimed
> > > > > > > > > > > to move the discussion towards), I think it would be good 
> > > > > > > > > > > to
> > > > > > create
> > > > > > > > > > DISCUSS
> > > > > > > > > > > threads for them all.
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Stan
> > > > > > > > > > >
> > > > > > > > > > > On 2025/04/16 11:58:22 Josep Prat wrote:
> > > > > > > > > > > > Hi Kafka Devs!
> > > > > > > > > > > >
> > > > > > > > > > > > We want to start a new KIP discussion about introducing 
> > > > > > > > > > > > a
> > > > new
> > > > > > > type of
> > > > > > > > > > > > topics that would make use of Object Storage as the 
> > > > > > > > > > > > primary
> > > > > > > source of
> > > > > > > > > > > > storage. However, as this KIP is big we decided to 
> > > > > > > > > > > > split it
> > > > > > into
> > > > > > > > > > multiple
> > > > > > > > > > > > related KIPs.
> > > > > > > > > > > > We have the motivational KIP-1150 (
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics
> > > > > > > > > > > )
> > > > > > > > > > > > that aims to discuss if Apache Kafka should aim to have
> > > > this
> > > > > > > type of
> > > > > > > > > > > > feature at all. This KIP doesn't go onto details on how 
> > > > > > > > > > > > to
> > > > > > > implement
> > > > > > > > > > it.
> > > > > > > > > > > > This follows the same approach used when we discussed
> > > > KRaft.
> > > > > > > > > > > >
> > > > > > > > > > > > But as we know that it is sometimes really hard to 
> > > > > > > > > > > > discuss
> > > > on
> > > > > > > that
> > > > > > > > > meta
> > > > > > > > > > > > level, we also created several sub-kips (linked in
> > > > KIP-1150)
> > > > > > that
> > > > > > > > > offer
> > > > > > > > > > > an
> > > > > > > > > > > > implementation of this feature.
> > > > > > > > > > > >
> > > > > > > > > > > > We kindly ask you to use the proper DISCUSS threads for
> > > > each
> > > > > > > type of
> > > > > > > > > > > > concern and keep this one to discuss whether Apache 
> > > > > > > > > > > > Kafka
> > > > wants
> > > > > > > to
> > > > > > > > > have
> > > > > > > > > > > > this feature or not.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks in advance on behalf of all the authors of this 
> > > > > > > > > > > > KIP.
> > > > > > > > > > > >
> > > > > > > > > > > > ------------------
> > > > > > > > > > > > Josep Prat
> > > > > > > > > > > > Open Source Engineering Director, Aiven
> > > > > > > > > > > > [email protected]   |   +491715557497 | aiven.io
> > > > > > > > > > > > Aiven Deutschland GmbH
> > > > > > > > > > > > Alexanderufer 3-7, 10117 Berlin
> > > > > > > > > > > > Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen,
> > > > > > > > > > > > Anna Richardson, Kenneth Chen
> > > > > > > > > > > > Amtsgericht Charlottenburg, HRB 209739 B
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > 
> > 
>

Re: [DISCUSS] KIP-1150 Diskless Topics

Reply via email to