ninsmiracle commented on PR #2184:
URL:
https://github.com/apache/incubator-pegasus/pull/2184#issuecomment-2705680140
### Add some information about dup sending delay
We conducted multiple control experiments on the test cluster with
`duplicate_log_batch_bytes` of 0, 4096, and 8192. It can be clearly seen that
configuring a larger `duplicate_log_batch_bytes` can improve the consumption
capacity of the cluster dup. For the table below, when
`duplicate_log_batch_bytes` is configured to 8192, the cluster is still able to
consume writes at 40k write QPS; but if `duplicate_log_batch_bytes` is
configured to 0, the cluster loses its consumption capacity at 20k write QPS.
However, if the cluster dup can consume existing writes, the larger the
`duplicate_log_batch_bytes`, the longer the delay in dup a piece of data
between the master and slave clusters.
Explain the 4th and 5th columns of the following table. When the delay
between the master and standby clusters is too small, the delay data displayed
by the monitoring is inaccurate. This is due to the counter reporting
granularity. So we make a program to read and write the corresponding keys on
both sides to determine the precise delay. However, when the delay between the
master and slave clusters is too large, the delay of reading and writing each
shard takes too long and is sometimes difficult to calculate. Therefore, we
mainly use monitoring data to compare the experimental results in the scenario
of large delay.
<!DOCTYPE html>
qps | plog Maximum backlog | duplicate_log_batch_bytes | master/slave dup
delay p99(Monitoring delay avg) | master/slave dup delay program test
-- | -- | -- | -- | --
0 | 3 | 0 | | p95 105ms/p99 108ms
0 | 3 | 4096 | | p95 106ms/p99 108ms
0 | 3 | 8192 | | p95 127ms/p99 150ms
8k | 13K | 0 | 120ms | p95 106ms/p99 137ms
8K | 17k | 4096 | 3.7s | p95 119ms/p99 1673ms
8K | 17.2k | 8192 | 6s | p95 138ms/p99 20s
20K | Continue to increase | 0 | Continue to increase | Difficult to observe
20K | 75k | 4096 | 25s | Difficult to observe
20K | 70k | 8192 | 25s | Difficult to observe
30K | 120k | 8192 | 26s | Difficult to observe
40K | 24k | 8192 | 28s | Difficult to observe
45K | Continue to increase | 8192 | Continue to increase | Difficult to
observe
==================================================
And here is an effect of adjusting the parameters of one of our online
clusters:
<!DOCTYPE html>
集群名 | duplicate_log_batch_bytes = 4096 | duplicate_log_batch_bytes = 0
-- | -- | --
c3srv-online | p95 1008msp99 1327ms | p95 100msp99 108ms
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]