This is an automated email from the ASF dual-hosted git repository.
lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/paimon.git
The following commit(s) were added to refs/heads/master by this push:
new b530c83d00 [doc] Add 'Data Skipping By File Index' for primary key
table
b530c83d00 is described below
commit b530c83d00e39948c579c8e608e3045aefad072b
Author: JingsongLi <[email protected]>
AuthorDate: Fri Aug 22 13:26:02 2025 +0800
[doc] Add 'Data Skipping By File Index' for primary key table
---
docs/content/append-table/query-performance.md | 9 +--------
.../content/primary-key-table/query-performance.md | 23 ++++++++++++++++++++++
2 files changed, 24 insertions(+), 8 deletions(-)
diff --git a/docs/content/append-table/query-performance.md
b/docs/content/append-table/query-performance.md
index 101970e643..dbc80a1d35 100644
--- a/docs/content/append-table/query-performance.md
+++ b/docs/content/append-table/query-performance.md
@@ -60,14 +60,7 @@ You can take a look at [Flink COMPACT Action]({{< ref
"maintenance/dedicated-com
You can use file index too, it filters files by indexing on the reading side.
-```sql
-CREATE TABLE <PAIMON_TABLE> (<COLUMN> <COLUMN_TYPE> , ...) WITH (
- 'file-index.bloom-filter.columns' = 'c1,c2',
- 'file-index.bloom-filter.c1.items' = '200'
-);
-```
-
-Define `file-index.bloom-filter.columns`, Data file index is an external index
file and Paimon will create its
+Define `file-index.bitmap.columns`, Data file index is an external index file
and Paimon will create its
corresponding index file for each file. If the index file is too small, it
will be stored directly in the manifest,
otherwise in the directory of the data file. Each data file corresponds to an
index file, which has a separate file
definition and can contain different types of indexes with multiple columns.
diff --git a/docs/content/primary-key-table/query-performance.md
b/docs/content/primary-key-table/query-performance.md
index 2ba19b0d3d..7310103307 100644
--- a/docs/content/primary-key-table/query-performance.md
+++ b/docs/content/primary-key-table/query-performance.md
@@ -59,6 +59,29 @@ Min max query can be also accelerated during compilation and
returns very quickl
For a regular bucketed table (For example, bucket = 5), the filtering
conditions of the primary key will greatly
accelerate queries and reduce the reading of a large number of files.
+## Data Skipping By File Index
+
+For full-compacted file, or for primary-key table with
`'deletion-vectors.enabled'`, you can use file index, it filters
+files by indexing on the reading side.
+
+Define `file-index.bitmap.columns`, Data file index is an external index file
and Paimon will create its
+corresponding index file for each file. If the index file is too small, it
will be stored directly in the manifest,
+otherwise in the directory of the data file. Each data file corresponds to an
index file, which has a separate file
+definition and can contain different types of indexes with multiple columns.
+
+Different file indexes may be efficient in different scenarios. For example
bloom filter may speed up query in point lookup
+scenario. Using a bitmap may consume more space but can result in greater
accuracy.
+
+* [BloomFilter]({{< ref "concepts/spec/fileindex#index-bloomfilter" >}}):
`file-index.bloom-filter.columns`.
+* [Bitmap]({{< ref "concepts/spec/fileindex#index-bitmap" >}}):
`file-index.bitmap.columns`.
+* [Range Bitmap]({{< ref "concepts/spec/fileindex#index-range-bitmap" >}}):
`file-index.range-bitmap.columns`.
+
+If you want to add file index to existing table, without any rewrite, you can
use `rewrite_file_index` procedure. Before
+we use the procedure, you should config appropriate configurations in target
table. You can use ALTER clause to config
+`file-index.<filter-type>.columns` to the table.
+
+How to invoke: see [flink procedures]({{< ref "flink/procedures#procedures"
>}})
+
## Bucketed Join
Fixed Bucketed table (e.g. bucket = 10) can be used to avoid shuffle if
necessary in batch query, for example, you can