This is an automated email from the ASF dual-hosted git repository.
lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/paimon.git
The following commit(s) were added to refs/heads/master by this push:
new 59fb7134b [doc] Add table mode page for primary key table
59fb7134b is described below
commit 59fb7134bd01417d255c73cc5a6adcfee57ae775
Author: Jingsong <[email protected]>
AuthorDate: Mon Jul 1 18:31:05 2024 +0800
[doc] Add table mode page for primary key table
---
.../primary-key-table/changelog-producer.md | 2 +-
docs/content/primary-key-table/deletion-vectors.md | 49 ---------
docs/content/primary-key-table/merge-engine.md | 2 +-
docs/content/primary-key-table/read-optimized.md | 47 --------
docs/content/primary-key-table/sequence-rowkind.md | 2 +-
docs/content/primary-key-table/table-mode.md | 119 +++++++++++++++++++++
docs/static/img/cow.png | Bin 0 -> 1600997 bytes
docs/static/img/lsm-inside-bucket.png | Bin 0 -> 2214166 bytes
docs/static/img/mor.png | Bin 0 -> 1595856 bytes
docs/static/img/mow-example.png | Bin 0 -> 1366847 bytes
docs/static/img/mow.png | Bin 0 -> 1302647 bytes
11 files changed, 122 insertions(+), 99 deletions(-)
diff --git a/docs/content/primary-key-table/changelog-producer.md
b/docs/content/primary-key-table/changelog-producer.md
index 45723da1f..88ae5817e 100644
--- a/docs/content/primary-key-table/changelog-producer.md
+++ b/docs/content/primary-key-table/changelog-producer.md
@@ -1,6 +1,6 @@
---
title: "Changelog Producer"
-weight: 4
+weight: 5
type: docs
aliases:
- /primary-key-table/changelog-producer.html
diff --git a/docs/content/primary-key-table/deletion-vectors.md
b/docs/content/primary-key-table/deletion-vectors.md
deleted file mode 100644
index 3eb6f293f..000000000
--- a/docs/content/primary-key-table/deletion-vectors.md
+++ /dev/null
@@ -1,49 +0,0 @@
----
-title: "Deletion Vectors"
-weight: 6
-type: docs
-aliases:
-- /primary-key-table/deletion-vectors.html
----
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements. See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership. The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied. See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-# Deletion Vectors
-
-## Overview
-
-The Deletion Vectors mode is designed to takes into account both data reading
and writing efficiency.
-
-In this mode, additional overhead (looking up LSM Tree and generating the
corresponding Deletion File) will be introduced during writing,
-but during reading, data can be directly retrieved by employing data with
deletion vectors, avoiding additional merge costs between different files.
-
-Furthermore, data reading concurrency is no longer limited, and non-primary
key columns can also be used for filter push down.
-Generally speaking, in this mode, we can get a huge improvement in read
performance without losing too much write performance.
-
-{{< img src="/img/deletion-vectors-overview.png">}}
-
-## Usage
-
-By specifying `'deletion-vectors.enabled' = 'true'`, the Deletion Vectors mode
can be enabled.
-
-## Limitation
-
-- `changelog-producer` needs to be `none` or `lookup`.
-- `merge-engine` can't be `first-row`, because the read of first-row is
already no merging, deletion vectors are not needed.
-- This mode will filter the data in level-0, so when using time travel to read
`APPEND` snapshot, there will be data delay.
diff --git a/docs/content/primary-key-table/merge-engine.md
b/docs/content/primary-key-table/merge-engine.md
index f4daa7bf7..32b897a00 100644
--- a/docs/content/primary-key-table/merge-engine.md
+++ b/docs/content/primary-key-table/merge-engine.md
@@ -1,6 +1,6 @@
---
title: "Merge Engine"
-weight: 3
+weight: 4
type: docs
aliases:
- /primary-key-table/merge-engine.html
diff --git a/docs/content/primary-key-table/read-optimized.md
b/docs/content/primary-key-table/read-optimized.md
deleted file mode 100644
index 1a5a72334..000000000
--- a/docs/content/primary-key-table/read-optimized.md
+++ /dev/null
@@ -1,47 +0,0 @@
----
-title: "Read Optimized"
-weight: 7
-type: docs
-aliases:
-- /primary-key-table/read-optimized.html
----
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements. See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership. The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied. See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-
-# Read Optimized
-
-## Overview
-
-For Primary Key Table, it's a 'MergeOnRead' technology. When reading data,
multiple layers of LSM data are merged,
-and the number of parallelism will be limited by the number of buckets.
Although Paimon's merge performance is efficient,
-it still cannot catch up with the ordinary AppendOnly table.
-
-We recommend that you use [Deletion Vectors]({{< ref
"primary-key-table/deletion-vectors" >}}) mode.
-
-If you don't want to use Deletion Vectors mode, you want to query fast enough
in certain scenarios, but can only find
-older data, you can also:
-
-1. Configure 'compaction.optimization-interval' when writing data. For
streaming jobs, optimized compaction will then
- be performed periodically; For batch jobs, optimized compaction will be
carried out when the job ends. (Or configure
- `'full-compaction.delta-commits'`, its disadvantage is that it can only
perform compaction synchronously, which will
- affect writing efficiency)
-2. Query from [read-optimized system table]({{< ref
"maintenance/system-tables#read-optimized-table" >}}). Reading from
- results of optimized files avoids merging records with the same key, thus
improving reading performance.
-
-You can flexibly balance query performance and data latency when reading.
diff --git a/docs/content/primary-key-table/sequence-rowkind.md
b/docs/content/primary-key-table/sequence-rowkind.md
index 876348b62..61e2c01c8 100644
--- a/docs/content/primary-key-table/sequence-rowkind.md
+++ b/docs/content/primary-key-table/sequence-rowkind.md
@@ -1,6 +1,6 @@
---
title: "Sequence & Rowkind"
-weight: 5
+weight: 6
type: docs
aliases:
- /primary-key-table/sequence-rowkind.html
diff --git a/docs/content/primary-key-table/table-mode.md
b/docs/content/primary-key-table/table-mode.md
new file mode 100644
index 000000000..15261cabc
--- /dev/null
+++ b/docs/content/primary-key-table/table-mode.md
@@ -0,0 +1,119 @@
+---
+title: "Table Mode"
+weight: 3
+type: docs
+aliases:
+- /primary-key-table/read-optimized.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Table Mode
+
+{{< img src="/img/lsm-inside-bucket.png">}}
+
+The file structure of the primary key table is roughly shown in the above
figure. The table or partition contains
+multiple buckets, and each bucket is a separate LSM tree structure that
contains multiple files.
+
+The writing process of LSM is roughly as follows: Flink checkpoint flush L0
files, and trigger a compaction as needed
+to merge the data. According to the different processing ways during writing,
there are three modes:
+
+1. MOR (Merge On Read): Default mode, only minor compactions are performed,
and merging are required for reading.
+2. COW (Copy On Write): Using `'full-compaction.delta-commits' = '1'`, full
compaction will be synchronized, which
+ means the merge is completed on write.
+3. MOW (Merge On Write): Using `'deletion-vectors.enabled' = 'true'`, in
writing phase, LSM will be queried to generate
+ the deletion vector file for the data file, which directly filters out
unnecessary lines during reading.
+
+The Merge On Write mode is recommended for general primary key tables
(merge-engine is default `deduplicate`).
+
+## Merge On Read
+
+MOR is the default mode of primary key table.
+
+{{< img src="/img/mor.png">}}
+
+When the mode is MOR, it is necessary to merge all files for reading, as all
files are ordered and undergo multi way
+merging, which includes a comparison calculation of the primary key.
+
+There is an obvious issue here, where a single LSM tree can only have a single
thread to read, so the read parallelism
+is limited. If the amount of data in the bucket is too large, it can lead to
poor read performance. So in order to read
+performance, it is recommended to analyze the query requirements table and set
the data volume in the bucket to be
+between 200MB and 1GB. But if the bucket is too small, there will be a lot of
small file reads and writes, causing
+pressure on the file system.
+
+In addition, due to the merging process, Filter based data skipping cannot be
performed on non primary key columns,
+otherwise new data will be filtered out, resulting in incorrect old data.
+
+- Write performance: very good.
+- Read performance: not so good.
+
+## Copy On Write
+
+```sql
+ALTER TABLE orders SET ('full-compaction.delta-commits' = '1');
+```
+
+Set `full-compaction.delta-commits` to 1, which means that every write will be
fully merged, and all data will be merged
+to the highest level. When reading, merging is not necessary at this time, and
the reading performance is the highest.
+But every write requires full merging, and write amplification is very severe.
+
+{{< img src="/img/cow.png">}}
+
+- Write performance: very bad.
+- Read performance: very good.
+
+## Merge On Write
+
+```sql
+ALTER TABLE orders SET ('deletion-vectors.enabled' = 'true');
+```
+
+Thanks to Paimon's LSM structure, it has the ability to be queried by primary
key. We can generate deletion vectors
+files when writing, representing which data in the file has been deleted. This
directly filters out unnecessary rows
+during reading, which is equivalent to merging and does not affect reading
performance.
+
+{{< img src="/img/mow.png">}}
+
+A simple example just like:
+
+{{< img src="/img/mow-example.png">}}
+
+Updates data by deleting old record first and then adding new one.
+
+- Write performance: good.
+- Read performance: good.
+
+{{< hint info >}}
+Visibility guarantee: Tables in deletion vectors mode, the files with level 0
will only be visible after compaction.
+So by default, compaction is synchronous, and if asynchronous is turned on,
there may be delays in the data.
+{{< /hint >}}
+
+## MOR Read Optimized
+
+If you don't want to use Deletion Vectors mode, you want to query fast enough
in MOR mode, but can only find
+older data, you can also:
+
+1. Configure 'compaction.optimization-interval' when writing data. For
streaming jobs, optimized compaction will then
+ be performed periodically; For batch jobs, optimized compaction will be
carried out when the job ends. (Or configure
+ `'full-compaction.delta-commits'`, its disadvantage is that it can only
perform compaction synchronously, which will
+ affect writing efficiency)
+2. Query from [read-optimized system table]({{< ref
"maintenance/system-tables#read-optimized-table" >}}). Reading from
+ results of optimized files avoids merging records with the same key, thus
improving reading performance.
+
+You can flexibly balance query performance and data latency when reading.
diff --git a/docs/static/img/cow.png b/docs/static/img/cow.png
new file mode 100644
index 000000000..caa9787a9
Binary files /dev/null and b/docs/static/img/cow.png differ
diff --git a/docs/static/img/lsm-inside-bucket.png
b/docs/static/img/lsm-inside-bucket.png
new file mode 100644
index 000000000..5ee5981eb
Binary files /dev/null and b/docs/static/img/lsm-inside-bucket.png differ
diff --git a/docs/static/img/mor.png b/docs/static/img/mor.png
new file mode 100644
index 000000000..0027f70bf
Binary files /dev/null and b/docs/static/img/mor.png differ
diff --git a/docs/static/img/mow-example.png b/docs/static/img/mow-example.png
new file mode 100644
index 000000000..9e944fc0d
Binary files /dev/null and b/docs/static/img/mow-example.png differ
diff --git a/docs/static/img/mow.png b/docs/static/img/mow.png
new file mode 100644
index 000000000..e9b392fcf
Binary files /dev/null and b/docs/static/img/mow.png differ