[
https://issues.apache.org/jira/browse/RATIS-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Andika updated RATIS-2403:
-------------------------------
Description:
While benchmarking linearizable follower read, the observation is that the more
requests go to the followers instead of the leader, the write throughput
improves dramatically by around 2-3x compared to the leader-only write and read
(most likely due to less leader resource contention). However, the read
throughput becomes worst than leader-only write and read (some can be below
0.2x). Even with optimizations such as RATIS-2392 RATIS-2382
[https://github.com/apache/ratis/pull/1334] RATIS-2379, the read throughput
remains worse than leader-only write (it even improves the write performance
instead of the read performance). I suspect that because write throughput
increase, the read index increases at a faster rate which causes follower
linearizable read to wait longer.
The target is to improve read throughput by 1.5x - 2x of the leader-only write
and reads. Currently pure reads (no writes) performance improves read
throughput up to 1.7x.
Currently my ideas are
* Sacrificing writes for reads: Can we limit the write QPS so that read QPS
can increase
** From the benchmark, the read throughput only improves when write throughput
is lower
** We can try to use backpressure mechanism so that writes do not advance so
quickly that read throughput suffer
*** Follower gap mechanisms (RATIS-1411), but this might cause leader to stall
if follower down for a while (e.g. restarted), which violates the majority
availability guarantee. It's also hard to know which value is optimal for
different workloads.
Raising this ticket for ideas. [~szetszwo] [~tanxinyu]
was:
While benchmarking linearizable follower read, the observation is that the more
requests go to the followers instead of the leader, the write throughput
improves dramatically by around 2-3x compared to the leader-only write and read
(most likely due to less leader resource contention). However, the read
throughput becomes worst than leader-only write and read (some can be below
0.2x). Even with optimizations such as RATIS-2392 RATIS-2382
[https://github.com/apache/ratis/pull/1334] RATIS-2379, the read throughput
remains worse than leader-only write (it even improves the write performance
instead of the read performance). I suspect that because write throughput
increase, the read index increases at a faster rate which causes follower
linearizable read to wait longer.
The target is to improve read throughput by 1.5x - 2x of the leader-only write
and reads.
Currently my ideas are
* Sacrificing writes for reads: Can we limit the write QPS so that read QPS
can increase
** From the benchmark, the read throughput only improves when write throughput
is lower
** We can try to use backpressure mechanism so that writes do not advance so
quickly that read throughput suffer
*** Follower gap mechanisms (RATIS-1411), but this might cause leader to stall
if follower down for a while (e.g. restarted), which violates the majority
availability guarantee. It's also hard to know which value is optimal for
different workloads.
Raising this ticket for ideas. [~szetszwo] [~tanxinyu]
> Improve linearizable follower read throughput instead of writes
> ---------------------------------------------------------------
>
> Key: RATIS-2403
> URL: https://issues.apache.org/jira/browse/RATIS-2403
> Project: Ratis
> Issue Type: Improvement
> Reporter: Ivan Andika
> Priority: Major
>
> While benchmarking linearizable follower read, the observation is that the
> more requests go to the followers instead of the leader, the write throughput
> improves dramatically by around 2-3x compared to the leader-only write and
> read (most likely due to less leader resource contention). However, the read
> throughput becomes worst than leader-only write and read (some can be below
> 0.2x). Even with optimizations such as RATIS-2392 RATIS-2382
> [https://github.com/apache/ratis/pull/1334] RATIS-2379, the read throughput
> remains worse than leader-only write (it even improves the write performance
> instead of the read performance). I suspect that because write throughput
> increase, the read index increases at a faster rate which causes follower
> linearizable read to wait longer.
> The target is to improve read throughput by 1.5x - 2x of the leader-only
> write and reads. Currently pure reads (no writes) performance improves read
> throughput up to 1.7x.
> Currently my ideas are
> * Sacrificing writes for reads: Can we limit the write QPS so that read QPS
> can increase
> ** From the benchmark, the read throughput only improves when write
> throughput is lower
> ** We can try to use backpressure mechanism so that writes do not advance so
> quickly that read throughput suffer
> *** Follower gap mechanisms (RATIS-1411), but this might cause leader to
> stall if follower down for a while (e.g. restarted), which violates the
> majority availability guarantee. It's also hard to know which value is
> optimal for different workloads.
> Raising this ticket for ideas. [~szetszwo] [~tanxinyu]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)