[jira] [Comment Edited] (RATIS-2403) Improve linearizable follower read throughput instead of writes

Ivan Andika (Jira) Tue, 24 Feb 2026 00:42:14 -0800


    [ 
https://issues.apache.org/jira/browse/RATIS-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18060647#comment-18060647
 ]


Ivan Andika edited comment on RATIS-2403 at 2/24/26 8:40 AM:
-------------------------------------------------------------

> Since it just delays the replies, the write throughput should not be 
> degraded. I think the problem is at the client side (i.e. the benchmark) – 
> the client may wait for the previous reply before sending another request. If 
> it is the case, using more clients should be able to keep the write 
> throughput.

Yeah, that makes sense, maybe the write throughput degradation issue can be 
reduced. However, since in Ozone case the handler thread handling writes will 
be blocked waiting for RaftServerImpl#submitClientRequestAsync to complete, we 
might also need to scale the handler.

> BTW, the code should replace the reference but not copying the list. Also, 
> use LinkedList for avoiding ArrayList resizing.

Thanks for the suggestion. The patch is mostly generated by LLM since it's 
currently for PoC, I will try to refine it further. If you think this can be 
safely pushed upstream, I can raise a PR for this.


was (Author: JIRAUSER298977):
> Since it just delays the replies, the write throughput should not be 
> degraded. I think the problem is at the client side (i.e. the benchmark) – 
> the client may wait for the previous reply before sending another request. If 
> it is the case, using more clients should be able to keep the write 
> throughput.

Yeah, that makes sense, maybe the write throughput degradation issue can be 
reduced. However, since in Ozone case the handler thread handling writes will 
be blocked waiting for RaftServerImpl#submitClientRequestAsync to complete, we 
might also need to scale the handler.

> BTW, the code should replace the reference but not copying the list. Also, 
> use LinkedList for avoiding ArrayList resizing.

Thanks for the suggestion. This is generated by LLM since it's currently for 
PoC, I will try to refine it further (e.g. by using AwaitForSignal instead of 
sleeping). If you think this can be safely pushed upstream, I can raise a PR 
for this.

> Improve linearizable follower read throughput instead of writes
> ---------------------------------------------------------------
>
>                 Key: RATIS-2403
>                 URL: https://issues.apache.org/jira/browse/RATIS-2403
>             Project: Ratis
>          Issue Type: Improvement
>            Reporter: Ivan Andika
>            Priority: Major
>         Attachments: leader-backpressure.patch, leader-batch-write.patch
>
>
> While benchmarking linearizable follower read, the observation is that the 
> more requests go to the followers instead of the leader, the better write 
> throughput becomes, we saw around 2-3x write throughput increase compared to 
> the leader-only write and read (most likely due to less leader resource 
> contention). However, the read throughput becomes worst than leader-only 
> write and read  (some can be below 0.2x). Even with optimizations such as 
> RATIS-2392 RATIS-2382 [https://github.com/apache/ratis/pull/1334] RATIS-2379, 
> the read throughput remains worse than leader-only write (it even improves 
> the write performance instead of the read performance).
> I suspect that because write throughput increase, the read index increases at 
> a faster rate which causes follower linearizable read to wait longer.
> The target is to improve read throughput by 1.5x - 2x of the leader-only 
> write and reads. Currently pure reads (no writes) performance improves read 
> throughput up to 1.7x, but total follower read throughput is way below this 
> target.
> Currently my ideas are
>  * Sacrificing writes for reads: Can we limit the write QPS so that read QPS 
> can increase
>  ** From the benchmark, the read throughput only improves when write 
> throughput is lower
>  ** We can try to use backpressure mechanism so that writes do not advance so 
> quickly that read throughput suffer
>  *** Follower gap mechanisms (RATIS-1411), but this might cause leader to 
> stall if follower down for a while (e.g. restarted), which violates the 
> majority availability guarantee. It's also hard to know which value is 
> optimal for different workloads.
> Raising this ticket for ideas. [~szetszwo] [~tanxinyu] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (RATIS-2403) Improve linearizable follower read throughput instead of writes

Reply via email to