[jira] [Comment Edited] (RATIS-2403) Improve linearizable follower read throughput instead of writes

Xinyu Tan (Jira) Wed, 11 Feb 2026 21:05:08 -0800


    [ 
https://issues.apache.org/jira/browse/RATIS-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18058029#comment-18058029
 ]


Xinyu Tan edited comment on RATIS-2403 at 2/12/26 5:04 AM:
-----------------------------------------------------------

[~ivanandika]
I believe the core issue of balancing read/write performance needs to be 
examined from two distinct perspectives:
 * Effectiveness of Follower read: We can define effectiveness through the 
following experimental model:
 ** Baseline: Both read and write loads are low, and the system has no resource 
bottlenecks.
 ** Introducing a Bottleneck: Increase the query load on the Leader by N times 
(number of replicas), forcing the Leader into a bottlenecked state. At this 
point, write throughput will inevitably drop due to resource contention, and 
query throughput will also be capped.
 ** Enabling Follower Read: Distribute these additional query loads evenly 
across all replicas.
 ** Expected Outcome: If write performance returns to its baseline state (no 
longer interfered with by queries) while the total query throughput increases 
significantly without hitting physical limits across the group, it proves that 
Ratis’s Follower Read is truly effective in offloading the Leader and 
decoupling resources.
 * Resource Zero-Sum Game under High Pressure:
In extreme stress-test scenarios, the total read/write capacity of the 
consensus group is capped by physical resources (CPU, I/O, Network).
A clear example is: even without Follower Read, if we pin all reads and writes 
to the Leader until it bottlenecks, any further increase in write load will 
inevitably decrease query throughput. This phenomenon persists even with 
Follower Read enabled, as faster writes force Followers to consume more 
resources for log synchronization and application (Apply), which in turn 
encroaches on query resources.

Based on the above analysis, I suggest the following directions:
 * Introduce Admission Control (Rate Limiting): Simply optimizing the algorithm 
cannot change the total resource limit. To fundamentally address the mutual 
interference between reads and writes, we might need a rate-limiting mechanism 
for writes. This would allow us to explicitly define a resource ceiling for 
writes within this trade-off, leaving guaranteed headroom for query throughput.
 * Enhance Observability and Identify Optimization Opportunities:I recommend 
analyzing disk IOPS, network bandwidth, and CPU flame graphs during future 
stress tests. Quantitative data—such as WAL write latency, gRPC serialization 
overhead, or state machine lock contention—is critical for pinpointing 
bottlenecks. For instance, in a previous optimization where I simply batched 
the put operations for the WAL blocking queue, I managed to save nearly 20% of 
CPU usage for Apache IoTDB. I believe that under current stress-test scenarios, 
we can uncover many more similar optimization opportunities in Ratis by using 
profiling tools.


was (Author: tanxinyu):
[~ivanandika]
I believe the core issue of balancing read/write performance needs to be 
examined from two distinct perspectives:
 * Effectiveness of Follower read: We can define effectiveness through the 
following experimental model:
 * 
 ** Baseline: Both read and write loads are low, and the system has no resource 
bottlenecks.
 ** Introducing a Bottleneck: Increase the query load on the Leader by N times 
(number of replicas), forcing the Leader into a bottlenecked state. At this 
point, write throughput will inevitably drop due to resource contention, and 
query throughput will also be capped.
 ** Enabling Follower Read: Distribute these additional query loads evenly 
across all replicas.
 ** Expected Outcome: If write performance returns to its baseline state (no 
longer interfered with by queries) while the total query throughput increases 
significantly without hitting physical limits across the group, it proves that 
Ratis’s Follower Read is truly effective in offloading the Leader and 
decoupling resources.
 * Resource Zero-Sum Game under High Pressure:
In extreme stress-test scenarios, the total read/write capacity of the 
consensus group is capped by physical resources (CPU, I/O, Network).
A clear example is: even without Follower Read, if we pin all reads and writes 
to the Leader until it bottlenecks, any further increase in write load will 
inevitably decrease query throughput. This phenomenon persists even with 
Follower Read enabled, as faster writes force Followers to consume more 
resources for log synchronization and application (Apply), which in turn 
encroaches on query resources.

Based on the above analysis, I suggest the following directions:
 * Introduce Admission Control (Rate Limiting): Simply optimizing the algorithm 
cannot change the total resource limit. To fundamentally address the mutual 
interference between reads and writes, we might need a rate-limiting mechanism 
for writes. This would allow us to explicitly define a resource ceiling for 
writes within this trade-off, leaving guaranteed headroom for query throughput.
 * Enhance Observability and Identify Optimization Opportunities:I recommend 
analyzing disk IOPS, network bandwidth, and CPU flame graphs during future 
stress tests. Quantitative data—such as WAL write latency, gRPC serialization 
overhead, or state machine lock contention—is critical for pinpointing 
bottlenecks. For instance, in a previous optimization where I simply batched 
the put operations for the WAL blocking queue, I managed to save nearly 20% of 
CPU usage for Apache IoTDB. I believe that under current stress-test scenarios, 
we can uncover many more similar optimization opportunities in Ratis by using 
profiling tools.

> Improve linearizable follower read throughput instead of writes
> ---------------------------------------------------------------
>
>                 Key: RATIS-2403
>                 URL: https://issues.apache.org/jira/browse/RATIS-2403
>             Project: Ratis
>          Issue Type: Improvement
>            Reporter: Ivan Andika
>            Priority: Major
>         Attachments: leader-backpressure.patch
>
>
> While benchmarking linearizable follower read, the observation is that the 
> more requests go to the followers instead of the leader, the better write 
> throughput becomes, we saw around 2-3x write throughput increase compared to 
> the leader-only write and read (most likely due to less leader resource 
> contention). However, the read throughput becomes worst than leader-only 
> write and read  (some can be below 0.2x). Even with optimizations such as 
> RATIS-2392 RATIS-2382 [https://github.com/apache/ratis/pull/1334] RATIS-2379, 
> the read throughput remains worse than leader-only write (it even improves 
> the write performance instead of the read performance).
> I suspect that because write throughput increase, the read index increases at 
> a faster rate which causes follower linearizable read to wait longer.
> The target is to improve read throughput by 1.5x - 2x of the leader-only 
> write and reads. Currently pure reads (no writes) performance improves read 
> throughput up to 1.7x, but total follower read throughput is way below this 
> target.
> Currently my ideas are
>  * Sacrificing writes for reads: Can we limit the write QPS so that read QPS 
> can increase
>  ** From the benchmark, the read throughput only improves when write 
> throughput is lower
>  ** We can try to use backpressure mechanism so that writes do not advance so 
> quickly that read throughput suffer
>  *** Follower gap mechanisms (RATIS-1411), but this might cause leader to 
> stall if follower down for a while (e.g. restarted), which violates the 
> majority availability guarantee. It's also hard to know which value is 
> optimal for different workloads.
> Raising this ticket for ideas. [~szetszwo] [~tanxinyu] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (RATIS-2403) Improve linearizable follower read throughput instead of writes

Reply via email to