ninsmiracle commented on PR #2184:
URL: 
https://github.com/apache/incubator-pegasus/pull/2184#issuecomment-2705680140

   ### Add some information about dup sending delay
     We conducted multiple control experiments on the test cluster with 
`duplicate_log_batch_bytes` of 0, 4096, and 8192. It can be clearly seen that 
configuring a larger `duplicate_log_batch_bytes` can improve the consumption 
capacity of the cluster dup. For the table below, when 
`duplicate_log_batch_bytes` is configured to 8192, the cluster is still able to 
consume writes at 40k write QPS; but if `duplicate_log_batch_bytes` is 
configured to 0, the cluster loses its consumption capacity at 20k write QPS. 
However, if the cluster dup can consume existing writes, the larger the 
`duplicate_log_batch_bytes`, the longer the delay in dup a piece of data 
between the master and slave clusters.
   ​  Explain the 4th and 5th columns of the following table. When the delay 
between the master and standby clusters is too small, the delay data displayed 
by the monitoring is inaccurate. This is due to the counter reporting 
granularity. So we make a program to read and write the corresponding keys on 
both sides to determine the precise delay. However, when the delay between the 
master and slave clusters is too large, the delay of reading and writing each 
shard takes too long and is sometimes difficult to calculate. Therefore, we 
mainly use monitoring data to compare the experimental results in the scenario 
of large delay.
   
   <!DOCTYPE html>
   qps | plog Maximum backlog | duplicate_log_batch_bytes | master/slave dup 
delay p99(Monitoring delay avg) | master/slave dup delay program test
   -- | -- | -- | -- | --
   0 | 3 | 0 |   | p95 105ms/p99 108ms
   0 | 3 | 4096 |   | p95 106ms/p99 108ms
   0 | 3 | 8192 |   | p95 127ms/p99 150ms
   8k | 13K | 0 | 120ms | p95 106ms/p99 137ms
   8K | 17k | 4096 | 3.7s | p95 119ms/p99 1673ms
   8K | 17.2k | 8192 | 6s | p95 138ms/p99 20s
   20K | Continue to increase | 0 | Continue to increase | Difficult to observe
   20K | 75k | 4096 | 25s | Difficult to observe
   20K | 70k | 8192 | 25s | Difficult to observe
   30K | 120k | 8192 | 26s | Difficult to observe
   40K | 24k | 8192 | 28s | Difficult to observe
   45K | Continue to increase | 8192 | Continue to increase | Difficult to 
observe
   
   ==================================================
   
   And here is an effect of adjusting the parameters of one of our online 
clusters:
   
   <!DOCTYPE html>
   集群名 | duplicate_log_batch_bytes = 4096 | duplicate_log_batch_bytes = 0
   -- | -- | --
   c3srv-online | p95 1008msp99 1327ms | p95 100msp99 108ms
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to