[
https://issues.apache.org/jira/browse/RATIS-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18057756#comment-18057756
]
Ivan Andika edited comment on RATIS-2403 at 2/11/26 8:23 AM:
-------------------------------------------------------------
[~szetszwo] Thanks for the batching idea. That sounds like a good idea. Let me
think how to approach this.
FYI, regarding the previous leader backpressure mechanism, I asked LLM to wrote
some prototype to block append log if the follower commit is behind the leader
and benchmarked it ([^leader-backpressure.patch]). I benchmarked it with
raft.server.write.follower.gap.ratio.max=1 . The result is that read is still
within 1.5x, but the write degrades to 0.2x-0.5x. This validates that slowing
writes would speed up the reads. However, this solution is not flexible enough
and should not be deployed.
LLM also tried to approach it by implementing SingleFlight pattern by making
multiple follower read requests to share one ReadIndex RPC. So the later
follower read requests will join the ongoing ReadIndex RPC. However, this seems
to be invalid and it will violate linearizability since the ReadIndex might
already be stale for the latter requests.
was (Author: JIRAUSER298977):
[~szetszwo] Thanks for the batching idea. That sounds like a good idea. Let me
think how to approach this.
FYI, regarding the previous leader backpressure mechanism, I asked LLM to wrote
some prototype of block append log and benchmarked it
([^leader-backpressure.patch]). I benchmarked it with
raft.server.write.follower.gap.ratio.max=1 . The result is that read is still
within 1.5x, but the write degrades to 0.2x-0.5x. This validates that slowing
writes would speed up the reads. However, this solution is not flexible enough
and should not be deployed.
LLM also tried to approach it by implementing SingleFlight pattern by making
multiple follower read requests to share one ReadIndex RPC. So the later
follower read requests will join the ongoing ReadIndex RPC. However, this seems
to be invalid and it will violate linearizability since the ReadIndex might
already be stale for the latter requests.
> Improve linearizable follower read throughput instead of writes
> ---------------------------------------------------------------
>
> Key: RATIS-2403
> URL: https://issues.apache.org/jira/browse/RATIS-2403
> Project: Ratis
> Issue Type: Improvement
> Reporter: Ivan Andika
> Priority: Major
> Attachments: leader-backpressure.patch
>
>
> While benchmarking linearizable follower read, the observation is that the
> more requests go to the followers instead of the leader, the better write
> throughput becomes, we saw around 2-3x write throughput increase compared to
> the leader-only write and read (most likely due to less leader resource
> contention). However, the read throughput becomes worst than leader-only
> write and read (some can be below 0.2x). Even with optimizations such as
> RATIS-2392 RATIS-2382 [https://github.com/apache/ratis/pull/1334] RATIS-2379,
> the read throughput remains worse than leader-only write (it even improves
> the write performance instead of the read performance).
> I suspect that because write throughput increase, the read index increases at
> a faster rate which causes follower linearizable read to wait longer.
> The target is to improve read throughput by 1.5x - 2x of the leader-only
> write and reads. Currently pure reads (no writes) performance improves read
> throughput up to 1.7x, but total follower read throughput is way below this
> target.
> Currently my ideas are
> * Sacrificing writes for reads: Can we limit the write QPS so that read QPS
> can increase
> ** From the benchmark, the read throughput only improves when write
> throughput is lower
> ** We can try to use backpressure mechanism so that writes do not advance so
> quickly that read throughput suffer
> *** Follower gap mechanisms (RATIS-1411), but this might cause leader to
> stall if follower down for a while (e.g. restarted), which violates the
> majority availability guarantee. It's also hard to know which value is
> optimal for different workloads.
> Raising this ticket for ideas. [~szetszwo] [~tanxinyu]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)