Hi Manan, We noticed you cross-posting some discussion items from the KIP-1248/KIP-1254 thread (1) and wanted to share our response to your points:
1. Performance & Latency 1.2 Increased internal bandwidth & Cost You mentioned RRR improves cost control. We view this differently: 1.2.a Cost: Clients reading directly from S3 (KIP-1254) is inherently cheaper since you don't need to launch and manage separate RRR instances. 1.2.b Throughput: Direct S3 access offers better scalability for fan-out reads. We can launch hundreds of clients reading parallel offset ranges from S3; Kafka's scale-out read capability is limited by the provisioned RRR fleet size. 2. Client & Protocol 2.2 Redirect-based flow You mentioned "Redirects are lightweight." Could you elaborate on the protocol? 2.2.a Switching: How does the client seamlessly switch between the RRR and the normal broker? Does the main broker return a specific error code? 2.2.b Flapping: If a client consumes at the hot/cold boundary, is there a risk of "flapping" (repeated disconnects) between Leader and RRR as segments roll over? 4. Metadata & Routing 4.1 Partition assignment The KIP states RRRs are "stateless" but also "optionally cache" data. These seem to be at odds with each other. 4.1.a Assignment: How is partition assignment handled when the topic partition count changes? Does the Controller explicitly map partitions to RRRs? 4.1.b Discovery: How does the cluster (and the client) discover a newly created RRR node when scaling out? 4.1.c Cache Hit Rate: If routing is purely dynamic/stateless, requests for the same partition could land on different nodes. This negates the benefit of local caching. Thanks, Tom & Henry (1) https://lists.apache.org/thread/j9l1orx8x67lwcbo6f7qgn7xs3p5bjq0 On 2026/01/09 12:07:41 Manan Gupta wrote: > Below are responses to the key concerns raised around RRRs in KIP-1248 and > KIP-1254, organized by area: > > 1. Performance & Latency > 1.1 Higher read latency > Yes. Historical reads add a hop (remote storage → RRR → client). This is > intentional: RRRs target cold and analytic workloads where throughput and > cost efficiency matter more than tail latency. > Mitigations include prefetching, local caching, larger sequential reads, > and AZ-local RRRs. Hot-path consumers continue to read directly from > leaders. > > 1.2 Increased internal bandwidth > RRRs increase internal traffic, but they: > > - > > Reduce load on leader brokers > - > > Centralize and optimize remote storage access > - > > Improve cost control versus per-client object storage reads > > 2. Client & Protocol > 2.1 Client complexity > Client complexity is reduced, not eliminated. Brokers remain authoritative, > clients stay storage-agnostic, and most complexity is encapsulated in > shared libraries. > > 2.2 Redirect-based flow > Redirects are lightweight and Kafka-native (similar to leader/coordinator > discovery). Clients follow broker instructions without understanding > storage layouts or tiering. > > 3. Semantics & Features > 3.1 Transactional semantics > Preserved. RRRs read canonical log segments, including transaction markers. > read_committed semantics are supported. > > 3.2 Newer features > RRRs initially support standard log consumption only. Features requiring > coordination or state mutation remain on main brokers by design. > > 4. Metadata & Routing > 4.1 Partition assignment > RRRs are stateless: no partition ownership, ISR participation, or > rebalancing. Routing is dynamic and broker/controller-driven. > > 4.2 AZ affinity > Handled via existing rack/AZ metadata and broker-directed redirects. > > 4.3 Failure handling > No state means no rebalancing. Clients retry against another RRR or fall > back to brokers. > > 5. Operations & Scaling > 5.1 Operational overhead > RRRs add a fleet but are stateless: no replication, elections, writes, or > durability responsibilities. They are easy to automate and replace. > > 5.2 Autoscaling > A first-class goal. RRRs scale on load, start quickly, and scale down > safely without state migration. > > 6. Architectural Trade-off > Yes, complexity is shifted—but deliberately off the hot path. This isolates > cold and bursty reads, protects real-time workloads, and cleanly separates > durability, serving, and analytics concerns. > On 2025/12/14 10:58:32 Manan Gupta wrote: > > Hi all, > > > > This email starts the discussion thread for *KIP-1255: Remote Read > Replicas > > for Kafka Tiered Storage*. The proposal introduces a lightweight broker > > role, *Remote Read Replica*, dedicated to serving historical reads > directly > > from remote storage. > > > > We’d appreciate your initial thoughts and feedback on the proposal. > > >
