(fluss) 03/11: [Docs] consistency & syntax fixes (#1243)

jark Thu, 31 Jul 2025 05:58:06 -0700

This is an automated email from the ASF dual-hosted git repository.

jark pushed a commit to branch release-0.7
in repository https://gitbox.apache.org/repos/asf/fluss.git


commit b080905a3f5dca0ea6b1c50a1bd3d84a367e1f06
Author: Giannis Polyzos <[email protected]>
AuthorDate: Thu Jul 17 10:04:39 2025 +0300

    [Docs] consistency & syntax fixes (#1243)
    
    * change PrimaryKey table to Primary Key Table across pages
    
    * syntactic fixes
    
    * make some more minor fixes
    
    * fix broken link
    
    * address yuxia's comments
---
 website/docs/engine-flink/ddl.md                   | 36 ++++++++++------------
 website/docs/intro.md                              |  4 +--
 website/docs/table-design/overview.md              | 16 +++++-----
 website/docs/table-design/table-types/log-table.md |  6 ++--
 .../table-types/pk-table/_category_.json           |  2 +-
 .../table-design/table-types/pk-table/index.md     | 34 ++++++++++----------
 .../pk-table/merge-engines/first-row.md            |  4 +--
 .../table-types/pk-table/merge-engines/index.md    |  6 ++--
 8 files changed, 52 insertions(+), 56 deletions(-)

diff --git a/website/docs/engine-flink/ddl.md b/website/docs/engine-flink/ddl.md
index 7affc06bf..a7cf22dc3 100644
--- a/website/docs/engine-flink/ddl.md
+++ b/website/docs/engine-flink/ddl.md
@@ -39,17 +39,17 @@ The following properties can be set if using the Fluss 
catalog:
 
 | Option                         | Required | Default   | Description          
                                                                                
                                                                                
|
 
|--------------------------------|----------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| type                           | required | (none)    | Catalog type, must 
to be 'fluss' here.                                                             
                                                                                
  |
+| type                           | required | (none)    | Catalog type, must 
be 'fluss' here.                                                                
                                                                               |
 | bootstrap.servers              | required | (none)    | Comma separated list 
of Fluss servers.                                                               
                                                                                
|
 | default-database               | optional | fluss     | The default database 
to use when switching to this catalog.                                          
                                                                                
|
 | client.security.protocol       | optional | PLAINTEXT | The security 
protocol used to communicate with brokers. Currently, only `PLAINTEXT` and 
`SASL` are supported, the configuration value is case insensitive.              
             |
-| `client.security.{protocol}.*` | optional | (none)    | Client-side 
configuration properties for a specific authentication protocol. E.g., 
client.security.sasl.jaas.config. More Details in 
[authentication](../security/authentication.md) | (none)        |
+| `client.security.{protocol}.*` | optional | (none)    | Client-side 
configuration properties for a specific authentication protocol. E.g., 
client.security.sasl.jaas.config. More Details in 
[authentication](../security/authentication.md) |
 
-The following introduced statements assuming the current catalog is switched 
to the Fluss catalog using `USE CATALOG <catalog_name>` statement.
+The following statements assume that the current catalog has been switched to 
the Fluss catalog using the `USE CATALOG <catalog_name>` statement.
 
 ## Create Database
 
-By default, FlussCatalog will use the `fluss` database in Flink. Using the 
following example to create a separate database in order to avoid creating 
tables under the default `fluss` database:
+By default, FlussCatalog will use the `fluss` database in Flink. You can use 
the following example to create a separate database to avoid creating tables 
under the default `fluss` database:
 
 ```sql title="Flink SQL"
 CREATE DATABASE my_db;
@@ -75,9 +75,9 @@ DROP DATABASE my_db;
 
 ## Create Table
 
-### PrimaryKey Table
+### Primary Key Table
 
-The following SQL statement will create a [PrimaryKey 
Table](table-design/table-types/pk-table/index.md) with a primary key 
consisting of shop_id and user_id.
+The following SQL statement will create a [Primary Key 
Table](table-design/table-types/pk-table/index.md) with a primary key 
consisting of shop_id and user_id.
 ```sql title="Flink SQL"
 CREATE TABLE my_pk_table (
   shop_id BIGINT,
@@ -105,14 +105,14 @@ CREATE TABLE my_log_table (
 );
 ```
 
-### Partitioned (PrimaryKey/Log) Table
+### Partitioned (Primary Key/Log) Table
 
 :::note
 1. Currently, Fluss only supports partitioned field with `STRING` type
-2. For the Partitioned PrimaryKey Table, the partitioned field (`dt` in this 
case) must be a subset of the primary key (`dt, shop_id, user_id` in this case)
+2. For the Partitioned Primary Key Table, the partitioned field (`dt` in this 
case) must be a subset of the primary key (`dt, shop_id, user_id` in this case)
 :::
 
-The following SQL statement creates a Partitioned PrimaryKey Table in Fluss.
+The following SQL statement creates a Partitioned Primary Key Table in Fluss.
 
 ```sql title="Flink SQL"
 CREATE TABLE my_part_pk_table (
@@ -145,7 +145,7 @@ But you can still use the [Add 
Partition](engine-flink/ddl.md#add-partition) sta
 
 #### Multi-Fields Partitioned Table
 
-Fluss also support [Multi-Fields 
Partitioning](table-design/data-distribution/partitioning.md#multi-field-partitioned-tables),
 the following SQL statement creates a Multi-Fields Partitioned Log Table in 
Fluss:
+Fluss also supports [Multi-Fields 
Partitioning](table-design/data-distribution/partitioning.md#multi-field-partitioned-tables),
 the following SQL statement creates a Multi-Fields Partitioned Log Table in 
Fluss:
 
 ```sql title="Flink SQL"
 CREATE TABLE my_multi_fields_part_log_table (
@@ -158,9 +158,9 @@ CREATE TABLE my_multi_fields_part_log_table (
 ) PARTITIONED BY (dt, nation);
 ```
 
-#### Auto partitioned (PrimaryKey/Log) table
+#### Auto Partitioned (Primary Key/Log) Table
 
-Fluss also support creat Auto Partitioned (PrimaryKey/Log) Table. The 
following SQL statement creates an Auto Partitioned PrimaryKey Table in Fluss.
+Fluss also supports creating Auto Partitioned (Primary Key/Log) Table. The 
following SQL statement creates an Auto Partitioned Primary Key Table in Fluss.
 
 ```sql title="Flink SQL"
 CREATE TABLE my_auto_part_pk_table (
@@ -193,7 +193,7 @@ CREATE TABLE my_auto_part_log_table (
 );
 ```
 
-For more details about Auto Partitioned (PrimaryKey/Log) Table, refer to [Auto 
Partitioning](table-design/data-distribution/partitioning.md#auto-partitioning).
+For more details about Auto Partitioned (Primary Key/Log) Table, refer to 
[Auto 
Partitioning](table-design/data-distribution/partitioning.md#auto-partitioning).
 
 
 ### Options
@@ -238,8 +238,8 @@ This will entirely remove all the data of the table in the 
Fluss cluster.
 
 ## Add Partition
 
-Fluss support manually add partitions to an exists partitioned table by Fluss 
Catalog. If the specified partition 
-not exists, Fluss will create the partition. If the specified partition 
already exists, Fluss will ignore the request 
+Fluss supports manually adding partitions to an existing partitioned table 
through the Fluss Catalog. If the specified partition 
+does not exist, Fluss will create the partition. If the specified partition 
already exists, Fluss will ignore the request 
 or throw an exception.
 
 To add partitions, run:
@@ -275,8 +275,8 @@ For more details, refer to the [Flink SHOW 
PARTITIONS](https://nightlies.apache.
 
 ## Drop Partition
 
-Fluss also support manually drop partitions from an exists partitioned table 
by Fluss Catalog. If the specified partition 
-not exists, Fluss will ignore the request or throw an exception.
+Fluss also supports manually dropping partitions from an existing partitioned 
table through the Fluss Catalog. If the specified partition 
+does not exist, Fluss will ignore the request or throw an exception.
 
 
 To drop partitions, run:
@@ -289,5 +289,3 @@ ALTER TABLE my_multi_fields_part_log_table DROP PARTITION 
(dt = '2025-03-05', na
 ```
 
 For more details, refer to the [Flink ALTER 
TABLE(DROP)](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/table/sql/alter/#drop)
 documentation.
-
-
diff --git a/website/docs/intro.md b/website/docs/intro.md
index d13fe1221..cbf85db4e 100644
--- a/website/docs/intro.md
+++ b/website/docs/intro.md
@@ -26,7 +26,7 @@ Fluss is a streaming storage built for real-time analytics 
which can serve as th
 
 ![arch](/img/fluss.png)
 
-It bridges the gap between **streaming data** and the data **Lakehouse** by 
enabling low-latency, high-throughput data ingestion and processing while 
seamlessly integrating with popular compute engines like **Apache Flink**, 
while **Apache Spark**, and **StarRocks** are coming soon.
+It bridges the gap between **streaming data** and the data **Lakehouse** by 
enabling low-latency, high-throughput data ingestion and processing while 
seamlessly integrating with popular compute engines like **Apache Flink**, with 
**Apache Spark** and **StarRocks** coming soon.
 
 Fluss supports `streaming reads` and `writes` with sub-second latency and 
stores data in a columnar format, enhancing query performance and reducing 
storage costs. 
 It offers flexible table types, including append-only **Log Tables** and 
updatable **PrimaryKey Tables**, to accommodate diverse real-time analytics and 
processing needs.
@@ -44,7 +44,7 @@ The following is a list of (but not limited to) use-cases 
that Fluss shines ✨:
 * **📡 Real-time IoT Pipelines**
 * **🚓 Real-time Fraud Detection**
 * **🚨 Real-time Alerting Systems**
-* **💫 Real-tim ETL/Data Warehouses**
+* **💫 Real-time ETL/Data Warehouses**
 * **🌐 Real-time Geolocation Services**
 * **🚚 Real-time Shipment Update Tracking**
 
diff --git a/website/docs/table-design/overview.md 
b/website/docs/table-design/overview.md
index 602c517ee..6e4f397b8 100644
--- a/website/docs/table-design/overview.md
+++ b/website/docs/table-design/overview.md
@@ -32,13 +32,13 @@ Tables are classified into two types based on the presence 
of a primary key:
 - **Log Tables:**
   - Designed for append-only scenarios.
   - Support only INSERT operations.
-- **PrimaryKey Tables:**
+- **Primary Key Tables:**
   - Used for updating and managing data in business databases.
   - Support INSERT, UPDATE, and DELETE operations based on the defined primary 
key.
 
-A Table becomes a [Partitioned 
Table](table-design/data-distribution/partitioning.md) when a partition column 
is defined. Data with the same partition value is stored in the same partition. 
Partition columns can be applied to both Log Tables and PrimaryKey Tables, but 
with specific considerations:
+A Table becomes a [Partitioned Table](data-distribution/partitioning.md) when 
a partition column is defined. Data with the same partition value is stored in 
the same partition. Partition columns can be applied to both Log Tables and 
Primary Key Tables, but with specific considerations:
 - **For Log Tables**, partitioning is commonly used for log data, typically 
based on date columns, to facilitate data separation and cleaning.
-- **For PrimaryKey Tables**, the partition column must be a subset of the 
primary key to ensure uniqueness.
+- **For Primary Key Tables**, the partition column must be a subset of the 
primary key to ensure uniqueness.
 
 This design ensures efficient data organization, flexibility in handling 
different use cases, and adherence to data integrity constraints.
 
@@ -58,14 +58,12 @@ The number of buckets `N` can be configured per table. A 
bucket is the smallest
 The data of a bucket consists of a LogTablet and a (optional) KvTablet.
 
 ### LogTablet
-A **LogTablet** needs to be generated for each bucket of Log and PrimaryKey 
tables.
-For Log Tables, the LogTablet is both the primary table data and the log data. 
For PrimaryKey tables, the LogTablet acts
+A **LogTablet** needs to be generated for each bucket of Log and Primary Key 
Tables.
+For Log Tables, the LogTablet is both the primary table data and the log data. 
For Primary Key Tables, the LogTablet acts
 as the log data for the primary table data.
 - **Segment:** The smallest unit of log storage in the **LogTablet**. A 
segment consists of an **.index** file and a **.log** data file.
-- **.index:** An `offset sparse index` that stores the mappings between the 
physical byte address in the message relative offset -> .log file.
+- **.index:** An `offset sparse index` that maps message relative offsets to 
their corresponding physical byte addresses in the .log file.
 - **.log:** Compact arrangement of log data.
 
 ### KvTablet
-Each bucket of the PrimaryKey table needs to generate a KvTablet. Underlying, 
each KvTablet corresponds to an embedded RocksDB instance. RocksDB is an LSM 
(log structured merge) engine which helps KvTablet supports high-performance 
updates and lookup query.
-
-
+Each bucket of the Primary Key Table needs to generate a KvTablet. Underlying, 
each KvTablet corresponds to an embedded RocksDB instance. RocksDB is an LSM 
(log structured merge) engine which helps KvTablet support high-performance 
updates and lookup queries.
diff --git a/website/docs/table-design/table-types/log-table.md 
b/website/docs/table-design/table-types/log-table.md
index b9c18102d..b9c264d97 100644
--- a/website/docs/table-design/table-types/log-table.md
+++ b/website/docs/table-design/table-types/log-table.md
@@ -60,7 +60,7 @@ Log Tables in Fluss allow real-time data consumption, 
preserving the order of da
 ## Column Pruning
 
 Column pruning is a technique used to reduce the amount of data that needs to 
be read from storage by eliminating unnecessary columns from the query.
-Fluss supports column pruning for Log Tables and the changelog of PrimaryKey 
Tables, which can significantly improve query performance by reducing the 
amount of data that needs to be read from storage and lowering networking costs.
+Fluss supports column pruning for Log Tables and the changelog of Primary Key 
Tables, which can significantly improve query performance by reducing the 
amount of data that needs to be read from storage and lowering networking costs.
 
 What sets Fluss apart is its ability to apply **column pruning during 
streaming reads**, a capability that is both unique and industry-leading. This 
ensures that even in real-time streaming scenarios, only the required columns 
are processed, minimizing resource usage and maximizing efficiency.
 
@@ -88,7 +88,7 @@ Additionally, compression is applied to each column 
independently, preserving th
 
 When compression is enabled:
 - For **Log Tables**, data is compressed by the writer on the client side, 
written in a compressed format, and decompressed by the log scanner on the 
client side.
-- For **PrimaryKey Table changelogs**, compression is performed server-side 
since the changelog is generated on the server.
+- For **Primary Key Table changelogs**, compression is performed server-side 
since the changelog is generated on the server.
 
 Log compression significantly reduces networking and storage costs. Benchmark 
results demonstrate that using the ZSTD compression with level 3 achieves a 
compression ratio of approximately **5x** (e.g., reducing 5GB of data to 1GB).
 Furthermore, read/write throughput improves substantially due to reduced 
networking overhead.
@@ -131,4 +131,4 @@ In the above example, we set the compression codec to 
`LZ4_FRAME` and the compre
 :::
 
 ## Log Tiering
-Log Table supports tiering data to different storage tiers. See more details 
about [Remote Log](maintenance/tiered-storage/remote-storage.md).
\ No newline at end of file
+Log Table supports tiering data to different storage tiers. See more details 
about [Remote Log](maintenance/tiered-storage/remote-storage.md).
diff --git a/website/docs/table-design/table-types/pk-table/_category_.json 
b/website/docs/table-design/table-types/pk-table/_category_.json
index 2aade25db..7374558c6 100644
--- a/website/docs/table-design/table-types/pk-table/_category_.json
+++ b/website/docs/table-design/table-types/pk-table/_category_.json
@@ -1,4 +1,4 @@
 {
-  "label": "PrimaryKey Table",
+  "label": "Primary Key Table",
   "position": 1
 }
diff --git a/website/docs/table-design/table-types/pk-table/index.md 
b/website/docs/table-design/table-types/pk-table/index.md
index 81ab05e06..1dc6e3393 100644
--- a/website/docs/table-design/table-types/pk-table/index.md
+++ b/website/docs/table-design/table-types/pk-table/index.md
@@ -1,5 +1,5 @@
 ---
-title: PrimaryKey Table
+title: Primary Key Table
 sidebar_position: 1
 ---
 
@@ -19,15 +19,15 @@ sidebar_position: 1
  limitations under the License.
 -->
 
-# PrimaryKey Table
+# Primary Key Table
 
 ## Basic Concept
 
-PrimaryKey Table in Fluss ensure the uniqueness of the specified primary key 
and supports `INSERT`, `UPDATE`,
+Primary Key Table in Fluss ensures the uniqueness of the specified primary key 
and supports `INSERT`, `UPDATE`,
 and `DELETE` operations.
 
-A PrimaryKey Table is created by specifying a `PRIMARY KEY` clause in the 
`CREATE TABLE` statement. For example, the
-following Flink SQL statement creates a PrimaryKey Table with `shop_id` and 
`user_id` as the primary key and distributes
+A Primary Key Table is created by specifying a `PRIMARY KEY` clause in the 
`CREATE TABLE` statement. For example, the
+following Flink SQL statement creates a Primary Key Table with `shop_id` and 
`user_id` as the primary key and distributes
 the data into 4 buckets:
 
 ```sql title="Flink SQL"
@@ -47,13 +47,13 @@ In Fluss primary key table, each row of data has a unique 
primary key.
 If multiple entries with the same primary key are written to the Fluss primary 
key table, only the last entry will be
 retained.
 
-For [Partitioned PrimaryKey 
Table](table-design/data-distribution/partitioning.md), the primary key must 
contain the
+For [Partitioned Primary Key 
Table](table-design/data-distribution/partitioning.md), the primary key must 
contain the
 partition key.
 
 ## Bucket Assigning
 
 For primary key tables, Fluss always determines which bucket the data belongs 
to based on the hash value of the bucket
-key (It must be a subset of the primary keys excluding partition keys of the 
primary key table) for each record. If the bucket key is not specified, the 
bucket key will used as the primary key (excluding the partition key).
+key (It must be a subset of the primary keys excluding partition keys of the 
primary key table) for each record. If the bucket key is not specified, the 
bucket key will be used as the primary key (excluding the partition key).
 Data with the same hash value will be distributed to the same bucket.
 
 ## Partial Update
@@ -92,20 +92,20 @@ follows:
 
 ## Merge Engines
 
-The **Merge Engine** in Fluss is a core component designed to efficiently 
handle and consolidate data updates for PrimaryKey Tables.
+The **Merge Engine** in Fluss is a core component designed to efficiently 
handle and consolidate data updates for Primary Key Tables.
 It offers users the flexibility to define how incoming data records are merged 
with existing records sharing the same primary key.
-However, users can specify a different merge engine to customize the merging 
behavior according to their specific use cases
+However, users can specify a different merge engine to customize the merging 
behavior according to their specific use cases.
 
 The following merge engines are supported:
 
-1. [Default Merge Engine 
(LastRow)](table-design/table-types/pk-table/merge-engines/default.md)
-2. [FirstRow Merge 
Engine](table-design/table-types/pk-table/merge-engines/first-row.md)
-3. [Versioned Merge 
Engine](table-design/table-types/pk-table/merge-engines/versioned.md)
+1. [Default Merge Engine (LastRow)](merge-engines/default.md)
+2. [FirstRow Merge Engine](merge-engines/first-row.md)
+3. [Versioned Merge Engine](merge-engines/versioned.md)
 
 
 ## Changelog Generation
 
-Fluss will capture the changes when inserting, updating, deleting records on 
the primary-key table, which is known as
+Fluss will capture the changes when inserting, updating, deleting records on 
the Primary Key Table, which is known as
 the changelog. Downstream consumers can directly consume the changelog to 
obtain the changes in the table. For example,
 consider the following primary key table in Fluss:
 
@@ -119,7 +119,7 @@ CREATE TABLE T
 );
 ```
 
-If the data written to the primary-key table is
+If the data written to the Primary Key Table is
 sequentially `+I(1, 2.0, 'apple')`, `+I(1, 4.0, 'banana')`, `-D(1, 4.0, 
'banana')`, then the following change data will
 be generated. For example, the following Flink SQL statements illustrate this 
behavior:
 
@@ -162,13 +162,13 @@ For primary key tables, Fluss supports various kinds of 
querying abilities.
 For a primary key table, the default read method is a full snapshot followed 
by incremental data. First, the
 snapshot data of the table is consumed, followed by the changelog data of the 
table.
 
-It is also possible to only consume the changelog data of the table. For more 
details, please refer to the [Flink Reads](engine-flink/reads.md)
+It is also possible to only consume the changelog data of the table. For more 
details, please refer to the [Flink Reads](../../../engine-flink/reads.md)
 
 ### Lookup
 
-Fluss primary key table can lookup data by the primary keys. If the key exists 
in Fluss, lookup will return a unique row. it always used in [Flink Lookup 
Join](engine-flink/lookups.md#lookup).
+Fluss primary key table can lookup data by the primary keys. If the key exists 
in Fluss, lookup will return a unique row. It is always used in [Flink Lookup 
Join](../../../engine-flink/lookups.md#lookup).
 
 ### Prefix Lookup
 
 Fluss primary key table can also do prefix lookup by the prefix subset primary 
keys. Unlike lookup, prefix lookup
-will scan data based on the prefix of primary keys and may return multiple 
rows. It always used in [Flink Prefix Lookup 
Join](engine-flink/lookups.md#prefix-lookup).
\ No newline at end of file
+will scan data based on the prefix of primary keys and may return multiple 
rows. It is always used in [Flink Prefix Lookup 
Join](../../../engine-flink/lookups.md#prefix-lookup).
diff --git 
a/website/docs/table-design/table-types/pk-table/merge-engines/first-row.md 
b/website/docs/table-design/table-types/pk-table/merge-engines/first-row.md
index c72831070..096988af9 100644
--- a/website/docs/table-design/table-types/pk-table/merge-engines/first-row.md
+++ b/website/docs/table-design/table-types/pk-table/merge-engines/first-row.md
@@ -23,7 +23,7 @@ sidebar_position: 3
 # FirstRow Merge Engine
 
 By setting `'table.merge-engine' = 'first_row'` in the table properties, users 
can retain the first record for each primary key.
-This configuration generates an insert-only changelog, allowing downstream 
Flink jobs to treat the PrimaryKey Table as an append-only Log Table.
+This configuration generates an insert-only changelog, allowing downstream 
Flink jobs to treat the Primary Key Table as an append-only Log Table.
 As a result, downstream transformations that do not support 
retractions/changelogs, such as [Window 
Aggregations](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/table/sql/queries/window-agg/)
 and [Interval 
Joins](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/table/sql/queries/joins/#interval-joins),
 can be applied seamlessly.
 
@@ -60,4 +60,4 @@ SELECT * FROM T WHERE k = 1;
 -- +---+-----+------+
 -- | 1 | 2.0 | t1   |
 -- +---+-----+------+
-```
\ No newline at end of file
+```
diff --git 
a/website/docs/table-design/table-types/pk-table/merge-engines/index.md 
b/website/docs/table-design/table-types/pk-table/merge-engines/index.md
index 7eeb98336..d67ac404e 100644
--- a/website/docs/table-design/table-types/pk-table/merge-engines/index.md
+++ b/website/docs/table-design/table-types/pk-table/merge-engines/index.md
@@ -21,12 +21,12 @@ sidebar_position: 1
 
 # Merge Engines
 
-The **Merge Engine** in Fluss is a core component designed to efficiently 
handle and consolidate data updates for PrimaryKey Tables.
+The **Merge Engine** in Fluss is a core component designed to efficiently 
handle and consolidate data updates for Primary Key Tables.
 It offers users the flexibility to define how incoming data records are merged 
with existing records sharing the same primary key.
-However, users can specify a different merge engine to customize the merging 
behavior according to their specific use cases
+However, users can specify a different merge engine to customize the merging 
behavior according to their specific use cases.
 
 The following merge engines are supported:
 
 1. [Default Merge Engine 
(LastRow)](table-design/table-types/pk-table/merge-engines/default.md)
 2. [FirstRow Merge 
Engine](table-design/table-types/pk-table/merge-engines/first-row.md)
-3. [Versioned Merge 
Engine](table-design/table-types/pk-table/merge-engines/versioned.md)
\ No newline at end of file
+3. [Versioned Merge 
Engine](table-design/table-types/pk-table/merge-engines/versioned.md)

(fluss) 03/11: [Docs] consistency & syntax fixes (#1243)

Reply via email to