[
https://issues.apache.org/jira/browse/RATIS-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Andika updated RATIS-2403:
-------------------------------
Description:
While benchmarking linearizable follower read, the observation is that the more
requests go to the followers instead of the leader, the better write throughput
becomes, we saw around 2-3x write throughput increase compared to the
leader-only write and read (most likely due to less leader resource
contention). However, the read throughput becomes worst than leader-only write
and read (some can be below 0.2x). Even with optimizations such as RATIS-2392
RATIS-2382 [https://github.com/apache/ratis/pull/1334] RATIS-2379, the read
throughput remains worse than leader-only write (it even improves the write
performance instead of the read performance).
I suspect that because write throughput increase, the read index increases at a
faster rate which causes follower linearizable read to wait longer.
The target is to improve read throughput by 1.5x - 2x of the leader-only write
and reads. Currently pure reads (no writes) performance improves read
throughput up to 1.7x, but total follower read throughput is way below this
target.
Currently my ideas are
* Sacrificing writes for reads: Can we limit the write QPS so that read QPS
can increase
** From the benchmark, the read throughput only improves when write throughput
is lower
** We can try to use backpressure mechanism so that writes do not advance so
quickly that read throughput suffer
*** Follower gap mechanisms (RATIS-1411), but this might cause leader to stall
if follower down for a while (e.g. restarted), which violates the majority
availability guarantee. It's also hard to know which value is optimal for
different workloads.
Raising this ticket for ideas. [~szetszwo] [~tanxinyu]
was:
While benchmarking linearizable follower read, the observation is that the more
requests go to the followers instead of the leader, the write throughput
improves dramatically by around 2-3x compared to the leader-only write and read
(most likely due to less leader resource contention). However, the read
throughput becomes worst than leader-only write and read (some can be below
0.2x). Even with optimizations such as RATIS-2392 RATIS-2382
[https://github.com/apache/ratis/pull/1334] RATIS-2379, the read throughput
remains worse than leader-only write (it even improves the write performance
instead of the read performance). I suspect that because write throughput
increase, the read index increases at a faster rate which causes follower
linearizable read to wait longer.
The target is to improve read throughput by 1.5x - 2x of the leader-only write
and reads. Currently pure reads (no writes) performance improves read
throughput up to 1.7x.
Currently my ideas are
* Sacrificing writes for reads: Can we limit the write QPS so that read QPS
can increase
** From the benchmark, the read throughput only improves when write throughput
is lower
** We can try to use backpressure mechanism so that writes do not advance so
quickly that read throughput suffer
*** Follower gap mechanisms (RATIS-1411), but this might cause leader to stall
if follower down for a while (e.g. restarted), which violates the majority
availability guarantee. It's also hard to know which value is optimal for
different workloads.
Raising this ticket for ideas. [~szetszwo] [~tanxinyu]
> Improve linearizable follower read throughput instead of writes
> ---------------------------------------------------------------
>
> Key: RATIS-2403
> URL: https://issues.apache.org/jira/browse/RATIS-2403
> Project: Ratis
> Issue Type: Improvement
> Reporter: Ivan Andika
> Priority: Major
>
> While benchmarking linearizable follower read, the observation is that the
> more requests go to the followers instead of the leader, the better write
> throughput becomes, we saw around 2-3x write throughput increase compared to
> the leader-only write and read (most likely due to less leader resource
> contention). However, the read throughput becomes worst than leader-only
> write and read (some can be below 0.2x). Even with optimizations such as
> RATIS-2392 RATIS-2382 [https://github.com/apache/ratis/pull/1334] RATIS-2379,
> the read throughput remains worse than leader-only write (it even improves
> the write performance instead of the read performance).
> I suspect that because write throughput increase, the read index increases at
> a faster rate which causes follower linearizable read to wait longer.
> The target is to improve read throughput by 1.5x - 2x of the leader-only
> write and reads. Currently pure reads (no writes) performance improves read
> throughput up to 1.7x, but total follower read throughput is way below this
> target.
> Currently my ideas are
> * Sacrificing writes for reads: Can we limit the write QPS so that read QPS
> can increase
> ** From the benchmark, the read throughput only improves when write
> throughput is lower
> ** We can try to use backpressure mechanism so that writes do not advance so
> quickly that read throughput suffer
> *** Follower gap mechanisms (RATIS-1411), but this might cause leader to
> stall if follower down for a while (e.g. restarted), which violates the
> majority availability guarantee. It's also hard to know which value is
> optimal for different workloads.
> Raising this ticket for ideas. [~szetszwo] [~tanxinyu]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)