This is an automated email from the ASF dual-hosted git repository.
laiyingchun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-pegasus-website.git
The following commit(s) were added to refs/heads/master by this push:
new 3d1b9801 Update ttl docs (#53)
3d1b9801 is described below
commit 3d1b9801945d29e6aa7fbe699b209a867a1b702f
Author: Yingchun Lai <[email protected]>
AuthorDate: Wed Jan 3 20:42:50 2024 +0800
Update ttl docs (#53)
---
_docs/en/api/ttl.md | 91 +++++++++++++++++++++++++++++++++++++++++-
_docs/zh/api/ttl.md | 54 +++++++++++++------------
assets/images/pegasus-ttl.png | Bin 244149 -> 93941 bytes
3 files changed, 119 insertions(+), 26 deletions(-)
diff --git a/_docs/en/api/ttl.md b/_docs/en/api/ttl.md
index 77e2c2eb..6e9525c0 100755
--- a/_docs/en/api/ttl.md
+++ b/_docs/en/api/ttl.md
@@ -2,4 +2,93 @@
permalink: api/ttl
---
-TRANSLATING
+# Principle
+
+Pegasus supports TTL (Time To Live) function, which means the expiration time
of the data can be specified when writing data. Once data expired, it is
invisible to the user and can no longer be accessed through interfaces such as
get/multiGet.
+
+Users set TTL via the `ttl_seconds` parameter which represents the number of
seconds after which the data will expire, starting from the current time. Zero
means that TTL is not set, that is, the data will never expire.
+
+How does TTL implement? Will the data be deleted from the disk immediately?
Next, let's talk about the implementation principle of TTL.
+
+Simply speaking, Pegasus TTL is achieved by recording the expiration time of
data when writing and checking the expiration time during queries. As shown in
the following figure:
+
+{:class="img-responsive"}
+
+**Writing process**
+
+* When writing data, the user uses `ttl_seconds` parameter on the client side
as the TTL, the client first calculates the expiration time of the data through
`ExpireTime = CurrentTime + ttl_seconds`, and then pass the data and
`ExpireTime` together to the ReplicaServer through RPC.
+* After receiving a write request, ReplicaServer undergoes various processes
(including writing WAL, replication, etc.) and finally stores the data in
RocksDB. When storing values, `ExpireTime` will be placed in the value header.
+
+**Reading process**
+
+* Users query the value data corresponding to the specified key through the
client
+* After receiving a read request, ReplicaServer first retrieves the value
corresponding to the key from RocksDB, and then extracts the `ExpireTime` from
the value header:
+ * If ExpireTime == 0, it indicates that the data has not been set TTL, it's
always valid.
+ * If ExpireTime > 0, it indicates that TTL has been set for the data, and
further comparison is made:
+ * If ExpireTime > now, the data has not expired and the user data in value
is returned
+ * If ExpireTime <= now, then the data has expired and returns `NotFound`
+
+**Data deletion**
+
+* After the data expires, it does not immediately remove from RocksDB, but
rather garbage collect through
[compaction](https://github.com/facebook/rocksdb/wiki/Compaction).
+* Pegasus uses a custom RocksDB
[CompactionFilter](https://github.com/facebook/rocksdb/wiki/Compaction-Filter)
during the compaction process, check the `ExpireTime` in the value header of
the data. If it has expired, discard the data, and it will not appear in the
newly generated file.
+* Because the GC process of expired data is asynchronous and depends on the
timing and frequency of compaction execution, data expiration and deletion
usually do not occur simultaneously. The only guarantee is that data deletion
will definitely occur after data expiration.
+* Expired but undeleted data will still occupy disk space.
+
+# Interface
+
+We provide interfaces for setting and querying TTL on both the client drivers
and shell tools.
+
+Taking Pegasus Java Client as an example, the interfaces for obtaining TTL
include:
+* [ttl](/clients/java-client#ttl)
+
+The interfaces for setting TTL include:
+* [set](/clients/java-client#set)
+* [batchSet](/clients/java-client#batchset)
+* [multiSet](/clients/java-client#multiset)
+* [batchMultiSet](/clients/java-client#batchmultiset)
+* [incr](/clients/java-client#incr) (Since Pegasus v1.11.1)
+* [checkAndSet](/clients/java-client#checkandset)
+
+The following commands in Shell tools can query/set TTL:
+* [ttl](/docs/tools/shell/#ttl)
+* [set](/docs/tools/shell/#set)
+* [multi_set](/docs/tools/shell/#multi_set)
+
+# Table level TTL
+
+Since Pegasus v1.11.2, Pegasus supports table level TTL functionality.
+
+## Implementation principle
+
+* Users set `default_ttl` environment variable in the [Table environment
variable](/administration/table-env)
+* MetaServer synchronizes environment variables to each ReplicaServer
asynchronously, so that each replica of the table obtains the environment
variable
+* After obtaining the environment variable in replica, parse to obtain the
`default_ttl` parameter, and take effect immediately. Afterward:
+ * If the user's newly written data's `ExpireTime` = 0, the actual
`ExpireTime` of the data will be set to `default_ttl`
+ * When RocksDB performs compaction, if the original data in the compact
input file **does not have** `ExpireTime`, then the `ExpireTime` of the new
data in the compact output file will be set to `default_ttl`
+ * Due to the uncertainty of the execution timing of the background
compaction, the time of data without TTL set `default_ttl` as TTL is also
uncertain
+ * If you want to set the TTL for all data quickly, you can use [Manual
Compact](/administration/manual-compact). So all data will be processed by
compaction, and data without TTL will be set TTL as `default_ttl`
+
+## Application scenarios
+
+* The disk space occupied by data tables is increasing. Users want to reduce
disk space usage, improve query performance by garbage-collecting data, or
reduce disk and CPU consumption
+* All or part of the data in the table has no TTL set
+* The validity of data without TTL is related to the write time. For example,
data written for more than a month will no longer have a query requirement and
can be discarded
+In scenarios where all three conditions are met, the purpose of cleaning up
disks and releasing resources can be achieved through the functions of table
level TTL and Manual Compact.
+
+# Calculate data write time through TTL
+
+If TTL is set during data writing, the time of data writing can be calculated
using TTL.
+
+Due to:
+```
+ExpireTime = InsertTime + TTLSeconds = now + TTLRemainingSeconds
+```
+Therefore:
+```
+InsertTime = now + TTLRemainingSeconds - TTLSeconds
+```
+Among them:
+* Now: The time when executing the Shell ttl command.
+* TTLRemainingSeconds: Obtained through [Shell's ttl
command](/overview/shell#ttl).
+* TTL seconds: The TTL set by the user when writing data.
diff --git a/_docs/zh/api/ttl.md b/_docs/zh/api/ttl.md
index a49327bc..837ed0d8 100755
--- a/_docs/zh/api/ttl.md
+++ b/_docs/zh/api/ttl.md
@@ -3,53 +3,57 @@ permalink: api/ttl
---
# 原理
-Pegasus支持TTL(Time-To-Live)功能,即在写入数据的时候,可以指定数据的过期时间。一旦过期,数据对用户就是不可见的,通过get/multiGet等接口都不再能访问到数据。
+Pegasus支持TTL(Time-To-Live)功能,即在写入数据的时候,可以指定数据的过期时间。数据一旦过期,便对用户不可见了,通过get/multiGet等接口都不再能访问到数据。
-设置的时候,用户通常都是提供`ttl_seconds`参数,表示从当前时间开始计算,多少秒之后数据过期。如果为0,则表示不设置TTL,即数据永不过期。
+用户通过`ttl_seconds`参数来设置TTL,表示从当前时间开始计算,多少秒之后数据过期。如果为0,则表示不设置TTL,即数据永不过期。
-用户通常有疑问,数据过期后对用户不可见是怎么实现的呢?数据会被立即删除吗?下面来讲讲TTL的实现原理。
+这是怎么实现的呢?数据会被立即从磁盘上删除吗?下面来讲讲TTL的实现原理。
简单来说,Pegasus的TTL是通过在写数据时记录数据的过期时间,在查询时对过期时间进行检查来实现的。如下图:
{:class="img-responsive"}
-**写入过程:**
-* 在写入数据时,用户在客户端通过`ttl_seconds`参数设置TTL时间,客户端先计算数据的过期时间`ExpireTime = CurrentTime
+ ttl_seconds`,然后通过RPC将数据和`ExpireTime`一起传给ReplicaServer端执行。
+**写入过程**
+* 在写入数据时,用户在客户端通过`ttl_seconds`参数设置TTL时间,客户端先计算数据的过期时间`ExpireTime = CurrentTime
+ ttl_seconds`,然后通过RPC将数据和`ExpireTime`一起传到ReplicaServer端。
*
ReplicaServer收到写请求后,经过各种处理(包括写WAL、replication复制等),最后将数据存储到RocksDB中。在存储value的时候,会将`ExpireTime`放在value头部。
-**读取过程:**
+**读取过程**
* 用户通过客户端查询指定key对应的value数据
* ReplicaServer收到读请求后,先从RocksDB获取到key对应的value,然后从value头部提取出`ExpireTime`:
* 如果ExpireTime == 0,表示数据没有设置TTL,是有效的。
* 如果ExpireTime > 0,表示数据设置了TTL,则进一步比较:
- * 如果ExpireTime > CurrentTime,则数据没有过期,返回value中的用户数据
- * 如果ExpireTime <= CurrentTime,则数据已经过期,返回`NotFound`
+ * 如果ExpireTime > now,则数据没有过期,返回value中的用户数据
+ * 如果ExpireTime <= now,则数据已经过期,返回`NotFound`
-**数据删除:**
-*
数据过期后,并不是立即从RocksDB中消失,而是通过[compaction](https://github.com/facebook/rocksdb/wiki/Compaction)来进行过期数据清理的。
-*
Pegasus使用了自定义的RocksDB[CompactionFilter](https://github.com/facebook/rocksdb/wiki/Compaction-Filter),使其在compaction过程中检查数据value头部的`ExpireTime`,如果已经过期,则将数据丢弃,它将不会出现在新生成的文件中。
-*
因为过期数据的删除过程是异步的,与compaction的执行时机和频率有关,所以数据过期与数据删除通常不是同时发生的,唯一能保证的是数据删除肯定发生在数据过期之后。
+**数据删除**
+*
数据过期后,并不是立即从RocksDB中删除,而是通过[compaction](https://github.com/facebook/rocksdb/wiki/Compaction)来进行垃圾回收的。
+* Pegasus使用了自定义的RocksDB
[CompactionFilter](https://github.com/facebook/rocksdb/wiki/Compaction-Filter),使其在compaction过程中检查数据value头部的`ExpireTime`,如果已经过期,则将数据丢弃,它将不会出现在新生成的文件中。
+*
因为过期数据的垃圾回收过程是异步的,与compaction的执行时机和频率有关,所以数据过期与数据删除通常不是同时发生的,唯一能保证的是数据删除肯定发生在数据过期之后。
* 已过期但未删除的数据依然会占用据磁盘空间。
# 接口
我们在客户端和Shell工具都提供了设置和查询TTL的接口。
-Pegasus Java Client中以下接口可以查询和设置TTL:
-* [ttl](/clients/java-client#ttl):获取指定数据的TTL信息。
-*
[set](/clients/java-client#set)和[batchSet](/clients/java-client#batchset):都提供了设置TTL的参数,其中batchSet是在SetItem中设置的。
-*
[multiSet](/clients/java-client#multiset)和[batchMultiSet](/clients/java-client#batchmultiset):都提供了设置TTL的参数。
-* [incr](/clients/java-client#batchmultiset):从v1.11.1版本开始,incr接口也提供了修改TTL的功能。
-*
[checkAndSet](/clients/java-client#checkandset):在CheckAndSetOptions中提供了设置TTL的参数。
+以Pegasus Java Client为例,获取TTL的接口有:
+* [ttl](/clients/java-client#ttl)
+设置TTL的接口有:
+* [set](/clients/java-client#set)
+* [batchSet](/clients/java-client#batchset)
+* [multiSet](/clients/java-client#multiset)
+* [batchMultiSet](/clients/java-client#batchmultiset)
+* [incr](/clients/java-client#incr)(从v1.11.1版本开始)
+* [checkAndSet](/clients/java-client#checkandset)
-Shell工具中以下命令可以查询和设置TTL:
-* [ttl](/docs/tools/shell/#ttl)命令:获取指定数据的TTL信息。
-*
[set](/docs/tools/shell/#set)和[multi_set](/docs/tools/shell/#multi_set)命令:都提供了设置TTL的参数。
+Shell工具中以下命令可以查询/设置TTL:
+* [ttl](/docs/tools/shell/#ttl)
+* [set](/docs/tools/shell/#set)
+* [multi_set](/docs/tools/shell/#multi_set)
# 表级TTL
从v1.11.2版本开始,Pegasus支持表级TTL功能。
## 实现原理
-* 用户在[Table环境变量](/administration/table-env)中设置`default_ttl`环境变量。
+* 用户在[Table环境变量](/administration/table-env)中设置`default_ttl`环境变量
* MetaServer将环境变量异步地同步到到各个ReplicaServer,使该表的每个replica都获取到该环境变量
* replica获得环境变量后,解析获得`default_ttl`配置,并立即生效。此后:
* 用户新写入的数据,如果`ExpireTime` = 0,则将数据的实际`ExpireTime`设置为`default_ttl`
@@ -58,9 +62,9 @@ Shell工具中以下命令可以查询和设置TTL:
* 如果想快速设置所有数据的TTL,则可以执行[Manual
Compact](/administration/manual-compact)。那么所有数据都会被compaction处理,未设置TTL的数据都会被被设置TTL为`default_ttl`
## 应用场景
-- 数据表占用的磁盘空间越来越大。想降低磁盘空间占用,或通过清理数据来提升查询速度,降低磁盘、CPU等资源消耗
-- 数据表中的所有数据或部分数据没有设置TTL
-- 未设置TTL的数据的有效性跟写入时间相关,比如写入时间超过一个月的数据就不再会有查询需求了,可以丢弃
+* 数据表占用的磁盘空间越来越大。用户想降低磁盘空间占用,或通过清理数据来提升查询性能,降低磁盘、CPU等资源消耗
+* 数据表中的所有数据或部分数据没有设置TTL
+* 未设置TTL的数据的有效性跟写入时间相关,比如写入时间超过一个月的数据就不再会有查询需求了,可以丢弃
同时满足以上3个条件的场景,就可以通过`表级TTL`和`Manual Compact`的功能实现清理磁盘释放资源的目的。
# 通过TTL计算数据写入时间
diff --git a/assets/images/pegasus-ttl.png b/assets/images/pegasus-ttl.png
index 340686d8..fa80f501 100644
Binary files a/assets/images/pegasus-ttl.png and
b/assets/images/pegasus-ttl.png differ
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]