[
https://issues.apache.org/jira/browse/RATIS-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18060444#comment-18060444
]
Tsz-wo Sze edited comment on RATIS-2403 at 2/23/26 7:16 PM:
------------------------------------------------------------
[~ivanandika], thanks for trying out the leader-batch-write!
bq. ... but the write throughput can be degraded to 0.7x (one case degrades to
0.3x), ...
Since it just delays the replies, the write throughput should not be degraded.
I think the problem is at the client side (i.e. the benchmark) -- the client
may wait for the previous reply before sending another request. If it is the
case, using more clients should be able to keep the write throughput.
It does make sense if the latency is degraded.
BTW, the code should replace the reference but not copying the list. Also, use
LinkedList for avoiding ArrayList resizing.
{code}
//LeaderStateImpl
private final AtomicReference<List<HeldReply>> heldReplies = ...
...
private void flushReplies() {
if (heldReplies.get().isEmpty()) {
return;
}
final List<HeldReply> toFlush = heldReplies.getAndSet(new LinkedList<>());
...
}
{code}
was (Author: szetszwo):
[~ivanandika], thanks for trying out the leader-batch-write!
bq. ... but the write throughput can be degraded to 0.7x (one case degrades to
0.3x), ...
Since it just delays the replies, the write throughput should not be degraded.
I think the problem is at the client side (i.e. the benchmark) -- the client
may wait for the previous reply before sending another request.
It makes sense if the latency is degraded.
BTW, the code should replace the reference but not copying the list. Also, use
LinkedList for avoiding ArrayList resizing.
{code}
//LeaderStateImpl
private final AtomicReference<List<HeldReply>> heldReplies = ...
...
private void flushReplies() {
if (heldReplies.get().isEmpty()) {
return;
}
final List<HeldReply> toFlush = heldReplies.getAndSet(new LinkedList<>());
...
}
{code}
> Improve linearizable follower read throughput instead of writes
> ---------------------------------------------------------------
>
> Key: RATIS-2403
> URL: https://issues.apache.org/jira/browse/RATIS-2403
> Project: Ratis
> Issue Type: Improvement
> Reporter: Ivan Andika
> Priority: Major
> Attachments: leader-backpressure.patch, leader-batch-write.patch
>
>
> While benchmarking linearizable follower read, the observation is that the
> more requests go to the followers instead of the leader, the better write
> throughput becomes, we saw around 2-3x write throughput increase compared to
> the leader-only write and read (most likely due to less leader resource
> contention). However, the read throughput becomes worst than leader-only
> write and read (some can be below 0.2x). Even with optimizations such as
> RATIS-2392 RATIS-2382 [https://github.com/apache/ratis/pull/1334] RATIS-2379,
> the read throughput remains worse than leader-only write (it even improves
> the write performance instead of the read performance).
> I suspect that because write throughput increase, the read index increases at
> a faster rate which causes follower linearizable read to wait longer.
> The target is to improve read throughput by 1.5x - 2x of the leader-only
> write and reads. Currently pure reads (no writes) performance improves read
> throughput up to 1.7x, but total follower read throughput is way below this
> target.
> Currently my ideas are
> * Sacrificing writes for reads: Can we limit the write QPS so that read QPS
> can increase
> ** From the benchmark, the read throughput only improves when write
> throughput is lower
> ** We can try to use backpressure mechanism so that writes do not advance so
> quickly that read throughput suffer
> *** Follower gap mechanisms (RATIS-1411), but this might cause leader to
> stall if follower down for a while (e.g. restarted), which violates the
> majority availability guarantee. It's also hard to know which value is
> optimal for different workloads.
> Raising this ticket for ideas. [~szetszwo] [~tanxinyu]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)