Hi all, Thanks for the KIP. I've reviewed 1150, 1163, and 1164, as well as the relevant discussion threads. I may have granular comments about 1163 and 1164 but the overall approach suggested in 1150 looks good to me. I especially like that the approach covers two main pain points of operating and paying for Kafka today: it allows cross-AZ traffic to be reduced (even eliminated in some cases), and it also allows local disk usage by brokers to be reduced (if operators opt for a small local cache on follower brokers for non-tiered segments).
+1 (binding) Cheers, Chris On Mon, Jan 26, 2026 at 3:36 PM vaquar khan <[email protected]> wrote: > Hi Josep, > > Thank you for the detailed response. I appreciate the clarification > regarding the distinction between the Inkless POC and the KIP design. > > However, my objection is not based on temporary bugs in the fork, but *on > architectural gaps in the KIPs themselves* that these implementation issues > highlighted. If we are voting to approve the design, the design documents > must be structurally complete regarding data safety. > > *1. Regarding Storage Leaks (The Missing Design)* You mentioned that > cleanup logic "can be defined later." However, KIP-1163 explicitly > delegates this responsibility to a separate process, and KIP-1165 (Object > Compaction/GC) is currently marked as "Discarded" in the wiki. > > We cannot vote to approve a storage engine that has no specified mechanism > for garbage collection. The "Upload-then-Commit" pattern described in > KIP-1163 structurally creates orphaned segments during broker failures. > Without an active KIP defining the reconciliation protocol (since KIP-1165 > was withdrawn), the proposal effectively describes a system with unbounded > storage growth during failure modes. This is a blocking design gap, not an > implementation detail. > > *2. Regarding EOS (The Coordinator Synchronization Gap)* This is not a > misunderstanding of standard Kafka transactions; it is a critique of how > KIP-1150 changes them. Standard EOS relies on the Partition Leader to > sequence markers and calculate the LSO (Last Stable Offset) in memory. > KIP-1150 removes the Leader. > > KIP-1164 (Batch Coordinator) must explicitly define the RPC flow between > the Transaction Coordinator and the Batch Coordinator to replace the > leader's role. Currently, the KIP does not specify how the system prevents > a "Split Brain" scenario where a consumer reads ahead of a transaction > marker that hasn't yet been sequenced by the Batch Coordinator. This is a > protocol-level correctness issue that must be resolved in the text before > adoption. > > Please note - I am maintaining my objection based on missing > specifications, not code bugs. > > I respectfully request that we pause the vote until: > > A valid design for Garbage Collection (replacing the discarded > KIP-1165) is added to the proposal. > > The Transaction/LSO synchronization protocol is explicitly documented > in KIP-1164. > > Regards, > > Vaquar Khan > Sr Data Architect > https://www.linkedin.com/in/vaquar-khan-b695577/ >
