[ 
https://issues.apache.org/jira/browse/RATIS-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Andika updated RATIS-2403:
-------------------------------
    Description: 
While benchmarking linearizable follower read, the observation is that the more 
requests go to the followers instead of the leader, the better write throughput 
becomes, we saw around 2-3x write throughput increase compared to the 
leader-only write and read (most likely due to less leader resource 
contention). However, the read throughput becomes worst than leader-only write 
and read  (some can be below 0.2x). Even with optimizations such as RATIS-2392 
RATIS-2382 [https://github.com/apache/ratis/pull/1334] RATIS-2379, the read 
throughput remains worse than leader-only write (it even improves the write 
performance instead of the read performance).

I suspect that because write throughput increase, the read index increases at a 
faster rate which causes follower linearizable read to wait longer.

The target is to improve read throughput by 1.5x - 2x of the leader-only write 
and reads. Currently pure reads (no writes) performance improves read 
throughput up to 1.7x, but total follower read throughput is way below this 
target.

Currently my ideas are
 * Sacrificing writes for reads: Can we limit the write QPS so that read QPS 
can increase
 ** From the benchmark, the read throughput only improves when write throughput 
is lower
 ** We can try to use backpressure mechanism so that writes do not advance so 
quickly that read throughput suffer
 *** Follower gap mechanisms (RATIS-1411), but this might cause leader to stall 
if follower down for a while (e.g. restarted), which violates the majority 
availability guarantee. It's also hard to know which value is optimal for 
different workloads.

Raising this ticket for ideas. [~szetszwo] [~tanxinyu] 

  was:
While benchmarking linearizable follower read, the observation is that the more 
requests go to the followers instead of the leader, the write throughput 
improves dramatically by around 2-3x compared to the leader-only write and read 
(most likely due to less leader resource contention). However, the read 
throughput becomes worst than leader-only write and read  (some can be below 
0.2x). Even with optimizations such as RATIS-2392 RATIS-2382 
[https://github.com/apache/ratis/pull/1334] RATIS-2379, the read throughput 
remains worse than leader-only write (it even improves the write performance 
instead of the read performance). I suspect that because write throughput 
increase, the read index increases at a faster rate which causes follower 
linearizable read to wait longer.

The target is to improve read throughput by 1.5x - 2x of the leader-only write 
and reads. Currently pure reads (no writes) performance improves read 
throughput up to 1.7x.

Currently my ideas are
 * Sacrificing writes for reads: Can we limit the write QPS so that read QPS 
can increase
 ** From the benchmark, the read throughput only improves when write throughput 
is lower
 ** We can try to use backpressure mechanism so that writes do not advance so 
quickly that read throughput suffer
 *** Follower gap mechanisms (RATIS-1411), but this might cause leader to stall 
if follower down for a while (e.g. restarted), which violates the majority 
availability guarantee. It's also hard to know which value is optimal for 
different workloads.

Raising this ticket for ideas. [~szetszwo] [~tanxinyu] 


> Improve linearizable follower read throughput instead of writes
> ---------------------------------------------------------------
>
>                 Key: RATIS-2403
>                 URL: https://issues.apache.org/jira/browse/RATIS-2403
>             Project: Ratis
>          Issue Type: Improvement
>            Reporter: Ivan Andika
>            Priority: Major
>
> While benchmarking linearizable follower read, the observation is that the 
> more requests go to the followers instead of the leader, the better write 
> throughput becomes, we saw around 2-3x write throughput increase compared to 
> the leader-only write and read (most likely due to less leader resource 
> contention). However, the read throughput becomes worst than leader-only 
> write and read  (some can be below 0.2x). Even with optimizations such as 
> RATIS-2392 RATIS-2382 [https://github.com/apache/ratis/pull/1334] RATIS-2379, 
> the read throughput remains worse than leader-only write (it even improves 
> the write performance instead of the read performance).
> I suspect that because write throughput increase, the read index increases at 
> a faster rate which causes follower linearizable read to wait longer.
> The target is to improve read throughput by 1.5x - 2x of the leader-only 
> write and reads. Currently pure reads (no writes) performance improves read 
> throughput up to 1.7x, but total follower read throughput is way below this 
> target.
> Currently my ideas are
>  * Sacrificing writes for reads: Can we limit the write QPS so that read QPS 
> can increase
>  ** From the benchmark, the read throughput only improves when write 
> throughput is lower
>  ** We can try to use backpressure mechanism so that writes do not advance so 
> quickly that read throughput suffer
>  *** Follower gap mechanisms (RATIS-1411), but this might cause leader to 
> stall if follower down for a while (e.g. restarted), which violates the 
> majority availability guarantee. It's also hard to know which value is 
> optimal for different workloads.
> Raising this ticket for ideas. [~szetszwo] [~tanxinyu] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to