HI Stan, Thanks for the detailed feedback! We've now published KIP-1254 [1] which is the consumer-side companion to KIP-1248 and addresses your questions in detail.
To highlight a few points: On storage format coupling: We've added wording to the Version Compatibility section [2]. This design intentionally shifts segment parsing from broker to consumer to reduce broker load. While this couples consumers to the on-disk format, SupportedStorageFormatVersions ensures graceful fallback when formats evolve. Consumers remain decoupled from storage backends (S3/GCS/Azure) via the RemoteStorageFetcher plugin interface. For proprietary Kafka implementations with different storage formats, this mechanism allows them to participate - if the client supports their format, direct fetch works; otherwise it falls back gracefully. Optional feature: Yes, opt-in via fetch.remote.enabled=false (default). See Consumer Configs [3]. Forward-compatibility: Covered in Version Compatibility [2]. The client sends a list of format versions it supports (e.g., ApacheKafkaV1). If a 6.x broker uses a new format not in the 4.x client's list, the broker falls back to traditional fetch. For the specific consumer-side questions: 1. Plugin system: We introduce RemoteStorageFetcher [4], a read-only interface similar to RemoteStorageManager on the broker side. Plugin matching is handled implicitly via SupportedStorageFormatVersions - if format versions don't align, the broker falls back to traditional fetch. 2. Cost & performance: The broker provides byte position hints derived from the OffsetIndex, and consumers request only the needed range via startPosition/endPosition in RemoteStorageFetcher.fetchLogSegment(). This enables byte-range GETs for S3-compatible systems. For backends that don't support range requests, the plugin implementation would handle buffering - this is implementation-specific and outside the KIP scope. 3. Fallbacks: Yes, covered in the Fallback section [5]. The consumer falls back to broker-mediated fetch on: timeout, connection failure, auth failure, or if RemoteStorageFetcher is not configured. Let us know if you'd like more detail on any of these. Thanks, Tom [1] https://cwiki.apache.org/confluence/display/KAFKA/KIP-1254%3A+Kafka+Consumer+Support+for+Remote+Tiered+Storage+Fetch [2] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399279678#KIP1254:KafkaConsumerSupportforRemoteTieredStorageFetch-VersionCompatibility [3] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399279678#KIP1254:KafkaConsumerSupportforRemoteTieredStorageFetch-ConsumerConfigs [4] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399279678#KIP1254:KafkaConsumerSupportforRemoteTieredStorageFetch-RemoteStorageFetcher [5] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399279678#KIP1254:KafkaConsumerSupportforRemoteTieredStorageFetch-Fallback On Tue, Dec 2, 2025 at 12:03 PM Stanislav Kozlovski < [email protected]> wrote: > Hey Henry, thanks for the KIP! I'm excited to see this proposal as I've > heard it be discussed privately before too. > > Can we have some wording that talks about the trade-offs of coupling > clients to the underlying storage format? Today, the underlying segment > format is decoupled from the clients, since the broker handles conversion > of log messages to what the protocol expects. I'm sure certain proprietary > Kafka implementations use different formats for their underlying storage - > it's an interesting question how they would handle this (to be explicit, > I'm not proposing we should cater our design to those systems though, > simply calling it out as a potential contention point). > > Things I'm thinking about: > - Would this be a optional feature? > - How would forward-compatibility look like? > > e.g if we ever want to switch the underlying storage format? To > bullet-proof ourselves, do we want to introduce some version matching which > could then help us understand non-compatibility and throw errors? (e.g we > change storage format in 6.x, and a 4.x client tries to read from a 6.x > broker/storage-foramt) > > Can we also have some wording on how this feature would look like on the > consumer-side? The proposal right now suggests we handle this in a > follow-up KIP, which makes sense for the details - but what about a > high-level overview and motivation? > > 1. We would likely need a similar plugin system for Consumers like brokers > have for KIP-405. Getting that interface right would be important. Ensuring > the plugin configured on the consumer matches the plugin configured on the > broker would be useful from a UX point of view too. > > 2. From a cost and performance perspective, how do we envision this being > used/configured on the consumer side? > > A single segment could be GBs of size. It's unlikely a consumer would want > to download the whole thing at once. > > For tiered backends that are S3-compatible cloud object storage systems, > we could likely use byte-range GETs, thus avoiding reading too much data > that'll get discarded. Are there concerns with other systems? A few words > on this topic would help imo. > > 3. Should we have fall-backs to the current behavior? > > Best, > Stan > > On 2025/12/02 11:04:13 Kamal Chandraprakash wrote: > > Hi Haiying, > > > > Thanks for the KIP! > > > > 1. Do you plan to add support for transactional consumers? Currently, the > > consumer doesn't return the aborted transaction records to the handler. > > 2. To access the remote storage directly, the client might need > additional > > certificates / keys. How do you plan to expose those configs on the > client? > > 3. Will it support the Queues for Kafka feature KIP-932 > > < > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-932*3A*Queues*for*Kafka__;JSsrKw!!DCbAVzZNrAf4!FnaTZ-RleISfnxHyS-2F1lvhDvglTHhW5Yg-cFch2FgGCd0lw2nUJ3gJtd1AqiwMlghMiwLQ7a6aD9KQlvax_GzYJG2eqtg$ > >? > > And so on. > > > > -- > > Kamal > > > > On Tue, Dec 2, 2025 at 10:29 AM Haiying Cai via dev < > [email protected]> > > wrote: > > > > > For some reason, the KIP link was truncated in the original email. > Here > > > is the link again: > > > > > > KIP: > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1248*3A*Allow*consumer*to*fetch*from*remote*tiered*storage__;JSsrKysrKysr!!DCbAVzZNrAf4!FnaTZ-RleISfnxHyS-2F1lvhDvglTHhW5Yg-cFch2FgGCd0lw2nUJ3gJtd1AqiwMlghMiwLQ7a6aD9KQlvax_GzYdpe2QXU$ > > > > > > Henry Haiying Cai > > > > > > On 2025/12/02 04:34:39 Henry Haiying Cai via dev wrote: > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > I would like to start discussion on KIP-1248: Allow consumer to fetch > > > from remote tiered storage > > > > > > > > > > > > > > > > KIP link: KIP-1248: Allow consumer to fetch from remote tiered > storage - > > > Apache Kafka - Apache Software Foundation > > > > > > > > | > > > > | > > > > | | > > > > KIP-1248: Allow consumer to fetch from remote tiered storage - Apache > > > Ka... > > > > > > > > > > > > | > > > > > > > > | > > > > > > > > | > > > > > > > > > > > > > > > > > > > > The KIP proposes to allow consumer clients to fetch from remote > tiered > > > storage directly to avoid hitting broker's network capacity and cache > > > performance. This is very useful to serve large backfill requests > from a > > > new or fallen-off consumer. > > > > > > > > > > > > > > > > > > > > Any feedback is appreciated. > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > Henry > > >
