[ 
https://issues.apache.org/jira/browse/KUDU-3447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xixu Wang updated KUDU-3447:
----------------------------
    Attachment: image-2023-02-13-17-32-11-650.png

> Limit the usage of network bandwidth of tablet copying 
> -------------------------------------------------------
>
>                 Key: KUDU-3447
>                 URL: https://issues.apache.org/jira/browse/KUDU-3447
>             Project: Kudu
>          Issue Type: Improvement
>            Reporter: Xixu Wang
>            Priority: Minor
>         Attachments: image-2023-02-09-10-38-50-512.png, 
> image-2023-02-09-10-47-58-370.png, image-2023-02-13-17-08-37-256.png, 
> image-2023-02-13-17-16-50-491.png, image-2023-02-13-17-22-25-368.png, 
> image-2023-02-13-17-25-15-997.png, image-2023-02-13-17-32-11-650.png
>
>
> Copying tablets from an old cluster to another new cluster is a high resource 
> consumed operation using the command : kudu local_replica copy_from_remote. 
> As the follow picture shows: the usage of memory is as high as 75%. And the 
> network is almost occupied fully (the overall network bandwidth is 2Gb/s). 
> Disk reading is every high (the overall disk bandwidth is 200MB/s). 
> !image-2023-02-09-10-47-58-370.png|width=996,height=369!
> If the data size is very large, the copying process will last for a long 
> time. Other service maybe get impacted and become unavailable. Therefore it 
> is better to limit the tablets copying speed and make the system more stable. 
> The goal is to balance the tablets copying speed and the impact to other 
> services.
> As copy_from_remote is mainly downloading data from the remote cluster and 
> write the data to local file system, it is better to control the downloading 
> speed to control the resource consumption. There are some algorithms to 
> implement a rate limiter. This patch will use the token bucket algorithm 
> implemented by Facebook Folly library: 
> [https://github.com/facebook/folly/blob/main/folly/TokenBucket.h]
>  
> *Performance Tests*
> 1. Data size:
> TABLE test_1
> on disk size: 13263880213
> live row count: 66433035
> 2. Test Case:
> case 1:
>  kudu local_replica copy_from_remote xxx_tablet_ids src_tserver_adddr:7050 
> -fs_data_dirs=/test/data_dir -fs_wal_dir=/test/wal_dir 
> -tablet_copy_download_threads_nums_per_session=4 -num_threads=4
> case 2:
> kudu local_replica copy_from_remote xxx_tablet_ids src_tserver_adddr:7050 
> -fs_data_dirs=/test/data_dir -fs_wal_dir=/test/wal_dir 
> -tablet_copy_download_threads_nums_per_session=4 -num_threads=4 
> -enable_network_speed_limit=true -limit_network_speed=25
> 3. Results:
> 3.1 The usage of CPU
> Left is test case 1, right is 2. As we can seek, using speed limit feature 
> can reduce CPU comsumption.
> !image-2023-02-13-17-08-37-256.png|width=418,height=559!!image-2023-02-13-17-16-50-491.png|width=794,height=369!
> 3.2 Load of CPU
> !image-2023-02-13-17-22-25-368.png|width=631,height=480!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to