ninsmiracle commented on PR #2184: URL: https://github.com/apache/incubator-pegasus/pull/2184#issuecomment-2705682711
> ### Add some information about dup sending delay > > We conducted multiple control experiments on the test cluster with `duplicate_log_batch_bytes` of 0, 4096, and 8192. It can be clearly seen that configuring a larger `duplicate_log_batch_bytes` can improve the consumption capacity of the cluster dup. For the table below, when `duplicate_log_batch_bytes` is configured to 8192, the cluster is still able to consume writes at 40k write QPS; but if `duplicate_log_batch_bytes` is configured to 0, the cluster loses its consumption capacity at 20k write QPS. However, if the cluster dup can consume existing writes, the larger the `duplicate_log_batch_bytes`, the longer the delay in dup a piece of data between the master and slave clusters. > > > And I think I need to explain the 4th and 5th columns of the following table. When the delay between the master and standby clusters is too small, the delay data displayed by the monitoring is inaccurate. This is due to the counter reporting granularity. So we make a program to read and write the corresponding keys on both sides to determine the precise delay. However, when the delay between the master and slave clusters is too large, the delay of reading and writing each shard takes too long and is sometimes difficult to calculate. Therefore, we mainly use monitoring data to compare the experimental results in the scenario of large delay. > > qps plog Maximum backlog duplicate_log_batch_bytes master/slave dup delay p99(Monitoring delay avg) master/slave dup delay program test > 0 3 0 p95 105ms/p99 108ms > 0 3 4096 p95 106ms/p99 108ms > 0 3 8192 p95 127ms/p99 150ms > 8k 13K 0 120ms p95 106ms/p99 137ms > 8K 17k 4096 3.7s p95 119ms/p99 1673ms > 8K 17.2k 8192 6s p95 138ms/p99 20s > 20K Continue to increase 0 Continue to increase Difficult to observe > 20K 75k 4096 25s Difficult to observe > 20K 70k 8192 25s Difficult to observe > 30K 120k 8192 26s Difficult to observe > 40K 24k 8192 28s Difficult to observe > 45K Continue to increase 8192 Continue to increase Difficult to observe > ================================================== > > And here is an effect of adjusting the parameters of one of our online clusters: > > 集群名 duplicate_log_batch_bytes = 4096 duplicate_log_batch_bytes = 0 > c3srv-online p95 1008ms/ p99 1327ms p95 100ms/ p99 108ms -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
