(incubator-pegasus-website) branch master updated: Update resource managment docs (#72)

wangdan Wed, 31 Jan 2024 22:12:20 -0800

This is an automated email from the ASF dual-hosted git repository.

wangdan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-pegasus-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 2cc67a7b Update resource managment docs (#72)
2cc67a7b is described below

commit 2cc67a7b724bab305d3ad3122b98db05b5dcae85
Author: Yingchun Lai <[email protected]>
AuthorDate: Thu Feb 1 14:12:10 2024 +0800

    Update resource managment docs (#72)
---
 _docs/en/administration/resource-management.md | 93 +++++++++++++++++++++++++-
 _docs/zh/administration/resource-management.md | 28 ++++----
 2 files changed, 106 insertions(+), 15 deletions(-)

diff --git a/_docs/en/administration/resource-management.md 
b/_docs/en/administration/resource-management.md
index b136c86a..56976c60 100644
--- a/_docs/en/administration/resource-management.md
+++ b/_docs/en/administration/resource-management.md
@@ -2,4 +2,95 @@
 permalink: administration/resource-management
 ---
 
-TRANSLATING
+# Background Introduction
+
+The main resources used by Pegasus include CPU, disk, memory, network, etc. 
The usage load of these system resources should not be too high, otherwise the 
Pegasus service may become unstable or even crash. It's recommend:
+* The storage usage of a single disk should not exceed 80%.
+* Memory usage should not exceed 80% of each node.
+* The number of network connections should not exceed the system's limit, and 
it is recommended to limit the number of connections less than 50000.
+
+By adjusting these configurations, the use of disk storage can be reduced:
+* Set `max_replicas_in_group = 3`, refer to [Replica 
management](#replica-management).
+* Set `gc_disk_error_replica_interval_seconds = 3600` and 
`gc_disk_garbage_replica_interval_seconds = 3600`, refer to [Garbage directory 
management](#garbage-directory-management).
+* Set `checkpoint_reserve_min_count = 2` and `checkpoint_reserve_time_seconds 
= 1200`, refer to [RocksDB checkpoints 
management](#rocksdb-checkpoints-management).
+
+# Replica management
+
+Pegasus recommends using 3 replicas (1 primary + 2 secondaries) and setting 
the `-r` parameter to 3 when creating tables.
+
+However, the actual number of replicas in the cluster may exceed 3, which is 
determined by the following configuration:
+```
+[meta_server]
+    max_replicas_in_group = 4
+```
+
+The meaning of this configuration: allow the maximum number of replicas 
(including primary and secondaries) of a partition, with a default value of 4 
(indicating allowing the retention of 1 inactive replica). Although there are 3 
active replicas (1 primary + 2 secondaries) being served, during the procedure 
of downtime recovery or load balancing, replicas may migrate from server A to 
server B. After the migration, the data on server A is actually no longer 
needed. However, with sufficien [...]
+
+In aim to reduce disk storage usage and delete useless replicas data in time, 
you can set `max_replicas_in_group = 3`, restart Meta Server to make the 
configuration take effect, and then set the [Load Rebalance](rebalance) level 
to `lively`, allowing Meta Server to delete useless replicas data.
+
+# Garbage directory management
+
+If the replica directory in Replica Server is no longer needed or damaged, it 
becomes a garbage directory: unnecessary directory has a `.gar` suffix, and 
damaged directory has a `.err` suffix. These directories are not deleted 
immediately, as they may still have value in certain extreme situations (such 
as recall data through them in the event of a cluster crash).
+
+There are two configurations that determine the actual deletion time for these 
directories:
+```
+[replication]
+    gc_disk_error_replica_interval_seconds = 604800
+    gc_disk_garbage_replica_interval_seconds = 86400
+```
+For these two types of directories, the last modification time (before Pegasus 
2.6 is the last modification time of the directory, starting from Pegasus 2.6 
is the timestamp field in the directory name) of the directory will be checked, 
and deletion is executed when the gap between the last modification time and 
the current time exceeds the corresponding configuration.
+
+In aim to reduce disk storage usage by deleting these garbage directories in 
time, it can be achieved by reducing the values of these two configurations.
+```
+[replication]
+    gc_disk_error_replica_interval_seconds = 3600
+    gc_disk_garbage_replica_interval_seconds = 3600
+```
+* If the Pegasus version is less than 1.11.3, it's needed to restart the 
Replica Server for the configurations to take effect.
+* If the Pegasus version is between 1.11.3 and 2.1, these two configurations 
can be modified and take effect at runtime through the 
`useless-dir-reserve-seconds` command in [Remote commands](remote-commands), 
without restarting the Replica Server process. For example, modify these two 
configurations to 0 for emergency cleaning of the garbage directories:
+```
+>>> remote_command -t replica-server useless-dir-reserve-seconds 0
+```
+After confirming that the cleaning is complete, restore to the configurations:
+```
+>>> remote_command -t replica-server useless-dir-reserve-seconds DEFAULT
+```
+* Starting from version 2.2, the configurations can be modified and take 
effect at runtime through the [HTTP API](/api/http) without restarting the 
Replica Server process.
+
+# RocksDB checkpoints management
+
+The storage engine of Replica Server is RocksDB, it generates 
[checkpoint](https://github.com/facebook/rocksdb/wiki/Checkpoints) regularly. 
The checkpoints are placed in the data directory of the replica, and are 
suffixed by the `last_durable_decree`.
+
+As shown in the figure below, the data directory of the replica contains the 
currently using `rdb` directory and several checkpoint directories:
+![checkpoint_dirs.png](/assets/images/checkpoint_dirs.png){:class="img-responsive"}
+
+When generating a checkpoint, the files in the checkpoint are generated 
through hard linking rather than copying. One of the sstable files may be held 
by the `rdb` or by one or more checkpoints. As long as any one of them holds, 
the data of that file exists on the disk, consuming storage space. The file can 
be deleted only when `rdb` and all checkpoints do not hold it.
+
+The RocksDB is continuously performing background compactions, so the sstable 
held by any checkpoint may no longer be held by `rdb` (call it expired). If the 
retention time of the checkpoints are too long, these expired sstables cannot 
be deleted in time, which consume extra disk storage space. Especially for 
tables with high write throughput, compaction occurs more frequently, and the 
lifecycle of a single sstable file is very short. If the number of checkpoints 
is kept relatively high, [...]
+
+The following configurations determine the strategy for deleting checkpoints:
+```
+[pegasus.server]
+    checkpoint_reserve_min_count = 2
+    checkpoint_reserve_time_seconds = 1800
+```
+* checkpoint_reserve_min_count: represents the minimum number of reserved 
checkpoints. Only when the number of checkpoints exceeds this limit, the oldest 
checkpoint may be deleted.
+* checkpoint_reserve_time_seconds: represents the minimum retention time of 
the checkpoint. Only when the generation time of the checkpoint exceeds this 
value from the current time can the oldest one be deleted.
+* The checkpoint will only be deleted when meet the 2 conditions 
simultaneously.
+
+In aim to reduce disk storage usage by deleting the old checkpoint directories 
in time, you can lower these two configurations. For example:
+```
+[pegasus.server]
+    checkpoint_reserve_min_count = 1
+    checkpoint_reserve_time_seconds = 1200
+```
+Note: It is not recommended to set `checkpoint_reserve_time_seconds` too low. 
Considering the impact on learning, it should be larger than 
`replica_assign_delay_ms_for_dropouts` (default is 5 minutes).
+
+## Set table level configuration
+
+Since Pegasus 1.11.3, it is supported to modify these two configurations at 
runtime through the [Table environment variable](table-env) for a specified 
table, without restarting the Replica Server process. For example:
+```
+>>> use <table_name>
+>>> set_app_envs rocksdb.checkpoint.reserve_min_count 1
+>>> set_app_envs rocksdb.checkpoint.reserve_time_seconds 600
+```
diff --git a/_docs/zh/administration/resource-management.md 
b/_docs/zh/administration/resource-management.md
index 2071ef30..3f4ef438 100644
--- a/_docs/zh/administration/resource-management.md
+++ b/_docs/zh/administration/resource-management.md
@@ -4,7 +4,7 @@ permalink: administration/resource-management
 
 # 背景介绍
 
-Pegasus 主要用到的资源包括 CPU、磁盘、内存、网络连接等。对这些系统资源的使用负载不要太高，否则 Pegasus 服务可能会不稳定甚至崩溃。建议：
+Pegasus 主要用到的资源包括 CPU、磁盘、内存、网络等。对这些系统资源的使用负载不要太高，否则 Pegasus 服务可能会不稳定甚至崩溃。建议：
 * 单块磁盘的存储使用不要超过 80%。
 * 内存使用不要超过每个节点的 80%。
 * 网络连接数不要超过系统配置，建议连接数控制在 5 万以内。
@@ -26,11 +26,11 @@ Pegasus 推荐使用 3 副本（1 primary + 2 secondaries），在创建表的
 
 该参数的意义是：允许一个 partition 中最多存在的副本数（包括活跃和不活跃的），默认为 4（表示允许保留 1 
个不活跃的副本）。虽然正在提供服务的活跃副本是 3 个（1 primary + 2 secondary），但是在宕机恢复或者负载均衡过程中，replica 
可能从 A 节点迁移到 B 节点，迁移完成后 A 节点上的数据实际上不需要了，但是在存储充足的情况下，可以继续将 A 节点的数据保留在磁盘上，如果将来 
replica 重新迁移到 A 节点，这些数据还有可能被重用，避免重新传输数据。
 
-如果想要节省磁盘存储使用量，及时删除无用的副本数据，就可以设置 `max_replicas_in_group = 3`，并重启 MetaServer 
使配置生效，然后设置 [负载均衡](rebalance) 状态为 `lively`，让 MetaServer 控制删除无用的副本数据。
+如果想要节省磁盘存储使用量，及时删除无用的副本数据，就可以设置 `max_replicas_in_group = 3`，并重启 Meta Server 
使配置生效，然后设置 [负载均衡](rebalance) 状态为 `lively`，让 Meta Server 允许删除无用的副本数据。
 
 # 垃圾目录管理
 
-ReplicaServer 中的 replica 目录如果不需要了或者损坏了，都会变成垃圾目录：不需要的目录会加 `.gar` 后缀，出错的目录会加 
`.err` 后缀。这些目录不会被立即删除，因为考虑到某些极端情况下可能还有价值（例如系统崩溃时通过他们来找回数据）。
+Replica Server 中的 replica 目录如果不需要了或者损坏了，都会变成垃圾目录：不需要的目录会加 `.gar` 后缀，出错的目录会加 
`.err` 后缀。这些目录不会被立即删除，因为考虑到某些极端情况下可能还有价值（例如系统崩溃时通过他们来找回数据）。
 
 有两个配置参数决定这些目录的真正删除时机：
 ```
@@ -38,16 +38,16 @@ ReplicaServer 中的 replica 目录如果不需要了或者损坏了，都会变
     gc_disk_error_replica_interval_seconds = 604800
     gc_disk_garbage_replica_interval_seconds = 86400
 ```
-参数的意义是：对于这两种目录，会检查目录的最后修改时间（2.6 版本以前是目录的最后修改时间，2.6 
开始是目录名中的时间戳字段），只有当最后修改时间与当前时间的差距超过了对应的参数时，才会执行删除。
+对于这两种目录，会检查目录的最后修改时间（2.6 版本以前是目录的最后修改时间，2.6 
开始是目录名中的时间戳字段），只有当最后修改时间与当前时间的差距超过了对应的参数时，才会执行删除。
 
-如果想要节省磁盘存储使用量，及时删除这些垃圾目录，可以减小这两个参数的值。
+如果想通过及时删除这些垃圾目录来节省磁盘存储使用量，可以减小这两个参数的值。
 ```
 [replication]
     gc_disk_error_replica_interval_seconds = 3600
     gc_disk_garbage_replica_interval_seconds = 3600
 ```
-* 如果版本小于 1.11.3，需要重启 ReplicaServer 使配置生效。
-* 如果版本在 1.11.3 到 2.1 之间，可以通过 [远程命令](remote-commands) 的 
`useless-dir-reserve-seconds` 命令来动态地同时修改这两个参数，不用重启 ReplicaServer 
进程使其生效。例如将这两个参数修改为 0，用于紧急清理垃圾目录：
+* 如果版本小于 1.11.3，需要重启 Replica Server 使配置生效。
+* 如果版本在 1.11.3 到 2.1 之间，可以通过 [远程命令](remote-commands) 的 
`useless-dir-reserve-seconds` 命令来动态地同时修改这两个参数，不用重启 Replica Server 
进程使其生效。例如将这两个参数修改为 0，用于紧急清理垃圾目录：
 ```
 >>> remote_command -t replica-server useless-dir-reserve-seconds 0
 ```
@@ -55,18 +55,18 @@ ReplicaServer 中的 replica 目录如果不需要了或者损坏了，都会变
 ```
 >>> remote_command -t replica-server useless-dir-reserve-seconds DEFAULT
 ```
-* 从版本 2.2 开始，可以通过 [HTTP 接口](/api/http) 动态修改这两个参数的值，不用重启 ReplicaServer 进程使其生效。
+* 从版本 2.2 开始，可以通过 [HTTP 接口](/api/http) 动态修改这两个参数的值，不用重启 Replica Server 进程使其生效。
 
-# Rocksdb checkpoints 管理
+# RocksDB checkpoints 管理
 
-ReplicaServer 底层使用 RocksDB 存储数据，会定期生成 
[checkpoint](https://github.com/facebook/rocksdb/wiki/Checkpoints)。Checkpoint 
目录会放在 replica 的 data 目录下，并以生成时的 `last_durable_decree` 作为作为后缀。
+Replica Server 底层使用 RocksDB 存储数据，会定期生成 
[checkpoint](https://github.com/facebook/rocksdb/wiki/Checkpoints)。Checkpoint 
目录会放在 replica 的 data 目录下，并以生成时的 `last_durable_decree` 作为作为后缀。
 
 如下图，replica 的 data 目录下包含当前正在使用的 rdb 目录和若干个 checkpoint 目录：
 
![checkpoint_dirs.png](/assets/images/checkpoint_dirs.png){:class="img-responsive"}
 
-生成 checkpoint 时，checkpoint 中的文件都是通过硬链接方式生成的，而不是通过拷贝的方式。其中的一个 sstable 文件可能被 rdb 
持有，也可能被一个或者多个 checkpoint 持有。只要任意一个在持有，该文件的数据就存在于磁盘盘上，占据存储空间。只有 rdb 和所有的 
checkpoint 都不持有该文件，他才会被删除。
+生成 checkpoint 时，checkpoint 中的文件都是通过硬链接方式生成的，而不是通过拷贝的方式。其中的一个 sstable 文件可能被 rdb 
持有，也可能被一个或者多个 checkpoint 持有。只要任意一个在持有，该文件的数据就存在于磁盘盘上，消耗存储空间。只有 rdb 和所有的 
checkpoint 都不持有该文件，他才会被删除。
 
-RocksDB 后台在持续进行 compaction 操作，所以 checkpoint 中持有的 sstable 可能已经不被 rdb 所持有了。如果 
checkpoint 的保留时间太长，这些过期的 sstable 不能被及时删除，就会占用额外的磁盘存储空间。尤其对于写入量大的表，compaction 
也会进行得更频繁，单个 sstable 文件的生命周期很短，如果 checkpoint 数保留得比较多的话，占用的存储空间很可能几倍于当前实际的数据大小。
+RocksDB 后台在持续进行 compaction 操作，所以 checkpoint 中持有的 sstable 可能已经不被 rdb 
所持有了（称其为过期）。如果 checkpoint 的保留时间太长，这些过期的 sstable 
不能被及时删除，就会占用额外的磁盘存储空间。尤其对于写入量大的表，compaction 也会进行得更频繁，单个 sstable 文件的生命周期很短，如果 
checkpoint 数保留得比较多的话，占用的存储空间很可能几倍于当前实际的数据大小。
 
 以下配置参数决定了 checkpoint 删除的策略：
 ```
@@ -85,11 +85,11 @@ RocksDB 后台在持续进行 compaction 操作，所以 checkpoint 中持有的
     checkpoint_reserve_min_count = 1
     checkpoint_reserve_time_seconds = 1200
 ```
-注意：不建议将 `checkpoint_reserve_time_seconds` 设得太小。考虑到对 learn 的影响，要尽量大于 
`replica_assign_delay_ms_for_dropouts` 的值（默认是 5 分钟），所以建议至少在 5 分钟以上。
+注意：不建议将 `checkpoint_reserve_time_seconds` 设得太小。考虑到对 learn 的影响，要大于 
`replica_assign_delay_ms_for_dropouts` 的值（默认是 5 分钟）。
 
 ## 设置表级配置
 
-从 1.11.3 版本开始，支持通过 [Table 环境变量](table-env) 动态修改指定表的这两项配置，可不重启 ReplicaServer 
进程。例如：
+从 1.11.3 版本开始，支持通过 [Table 环境变量](table-env) 动态修改指定表的这两项配置，可不重启 Replica Server 
进程。例如：
 ```
 >>> use <table_name>
 >>> set_app_envs rocksdb.checkpoint.reserve_min_count 1


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(incubator-pegasus-website) branch master updated: Update resource managment docs (#72)

Reply via email to