[ 
https://issues.apache.org/jira/browse/RATIS-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18058029#comment-18058029
 ] 

Xinyu Tan commented on RATIS-2403:
----------------------------------

[~ivanandika]
I believe the core issue of balancing read/write performance needs to be 
examined from two distinct perspectives:
 # Effectiveness of Follower read:
Under a fixed load where the Leader has not yet reached its physical 
bottleneck, we need to verify whether enabling Follower Read effectively 
balances the query load across all replicas. If we observe a query throughput 
increase nearly proportional to the number of replicas while write throughput 
remains stable, it proves that the current Follower Read mechanism in Ratis is 
effective in offloading the Leader.
 # Resource Zero-Sum Game under High Pressure:
In extreme stress-test scenarios, the total read/write capacity of the 
consensus group is capped by physical resources (CPU, I/O, Network).
A clear example is: even without Follower Read, if we pin all reads and writes 
to the Leader until it bottlenecks, any further increase in write load will 
inevitably decrease query throughput. This phenomenon persists even with 
Follower Read enabled, as faster writes force Followers to consume more 
resources for log synchronization and application (Apply), which in turn 
encroaches on query resources.

Based on the above analysis, I suggest the following directions:
 * Introduce Admission Control (Rate Limiting): Simply optimizing the algorithm 
cannot change the total resource limit. To fundamentally address the mutual 
interference between reads and writes, we might need a rate-limiting mechanism 
for writes. This would allow us to explicitly define a resource ceiling for 
writes within this trade-off, leaving guaranteed headroom for query throughput.
 * Enhance Observability and Identify Optimization Opportunities:I recommend 
analyzing disk IOPS, network bandwidth, and CPU flame graphs during future 
stress tests. Quantitative data—such as WAL write latency, gRPC serialization 
overhead, or state machine lock contention—is critical for pinpointing 
bottlenecks. For instance, in a previous optimization where I simply batched 
the put operations for the WAL blocking queue, I managed to save nearly 20% of 
CPU usage for Apache IoTDB. I believe that under current stress-test scenarios, 
we can uncover many more similar optimization opportunities in Ratis by using 
profiling tools.

> Improve linearizable follower read throughput instead of writes
> ---------------------------------------------------------------
>
>                 Key: RATIS-2403
>                 URL: https://issues.apache.org/jira/browse/RATIS-2403
>             Project: Ratis
>          Issue Type: Improvement
>            Reporter: Ivan Andika
>            Priority: Major
>         Attachments: leader-backpressure.patch
>
>
> While benchmarking linearizable follower read, the observation is that the 
> more requests go to the followers instead of the leader, the better write 
> throughput becomes, we saw around 2-3x write throughput increase compared to 
> the leader-only write and read (most likely due to less leader resource 
> contention). However, the read throughput becomes worst than leader-only 
> write and read  (some can be below 0.2x). Even with optimizations such as 
> RATIS-2392 RATIS-2382 [https://github.com/apache/ratis/pull/1334] RATIS-2379, 
> the read throughput remains worse than leader-only write (it even improves 
> the write performance instead of the read performance).
> I suspect that because write throughput increase, the read index increases at 
> a faster rate which causes follower linearizable read to wait longer.
> The target is to improve read throughput by 1.5x - 2x of the leader-only 
> write and reads. Currently pure reads (no writes) performance improves read 
> throughput up to 1.7x, but total follower read throughput is way below this 
> target.
> Currently my ideas are
>  * Sacrificing writes for reads: Can we limit the write QPS so that read QPS 
> can increase
>  ** From the benchmark, the read throughput only improves when write 
> throughput is lower
>  ** We can try to use backpressure mechanism so that writes do not advance so 
> quickly that read throughput suffer
>  *** Follower gap mechanisms (RATIS-1411), but this might cause leader to 
> stall if follower down for a while (e.g. restarted), which violates the 
> majority availability guarantee. It's also hard to know which value is 
> optimal for different workloads.
> Raising this ticket for ideas. [~szetszwo] [~tanxinyu] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to