(paimon-website) 02/02: try fix

yuzelin Mon, 06 Jan 2025 19:48:35 -0800

This is an automated email from the ASF dual-hosted git repository.

yuzelin pushed a commit to branch pypaimon_0.2.0
in repository https://gitbox.apache.org/repos/asf/paimon-website.git


commit b861db30697bf3e19953e4a1fc4eae216ab95ecd
Author: yuzelin <[email protected]>
AuthorDate: Tue Jan 7 11:46:25 2025 +0800

    try fix
---
 .../docs/releases/{ => paimon}/release-0.4.md      |  58 +--
 .../docs/releases/{ => paimon}/release-0.5.md      | 400 ++++++++++----------
 .../docs/releases/{ => paimon}/release-0.6.md      | 408 ++++++++++-----------
 .../docs/releases/{ => paimon}/release-0.7.md      |   0
 .../docs/releases/{ => paimon}/release-0.8.1.md    |   0
 .../docs/releases/{ => paimon}/release-0.8.2.md    |   0
 .../docs/releases/{ => paimon}/release-0.8.md      |   0
 .../docs/releases/{ => paimon}/release-0.9.md      |   0
 .../release-pypaimon-0.2.0.md}                     |   0
 9 files changed, 433 insertions(+), 433 deletions(-)

diff --git a/community/docs/releases/release-0.4.md 
b/community/docs/releases/paimon/release-0.4.md
similarity index 98%
rename from community/docs/releases/release-0.4.md
rename to community/docs/releases/paimon/release-0.4.md
index 4ca3658f40..bdb68763cb 100644
--- a/community/docs/releases/release-0.4.md
+++ b/community/docs/releases/paimon/release-0.4.md
@@ -1,29 +1,29 @@
----
-title: "Release 0.4"
-type: release
-version: 0.4.0
----
-
-# Apache Paimon 0.4 Available
-
-June 07, 2023
-
-We are happy to announce the availability of Paimon 0.4. This is the first 
release of the system inside the Apache Incubator and under the name Paimon. 
Releases up to 0.3 were under the name Flink Table Store, a sub-project of 
Flink where Paimon originates from.
-
-## What is Paimon?
-
-Apache Paimon(incubating) is a streaming data lake platform that supports 
high-speed data ingestion, change data tracking and efficient real-time 
analytics.
-
-Paimon offers the following core capabilities:
-
-- **Unified Batch & Streaming**: Paimon supports batch write and batch read, 
as well as streaming write changes and streaming read table changelogs.
-- **Data Lake**: As a data lake storage, Paimon has the following advantages: 
low cost, high reliability, and scalable metadata.
-- **Merge Engines**: Paimon supports rich Merge Engines. By default, the last 
entry of the primary key is reserved. You can also use the "partial-update" or 
"aggregation" engine.
-- **Changelog producer**: Paimon supports rich Changelog producers, such as 
"lookup" and "full-compaction". The correct changelog can simplify the 
construction of a streaming pipeline.
-- **Append Only Tables**: Paimon supports Append Only tables, automatically 
compact small files, and provides orderly stream reading. You can use this to 
replace message queues.
-
-## Release 0.4
-
-Paimon 0.4 includes many bug fixes and improvements that make the system more 
stable and robust.
-
-Download the release 
[here](https://paimon.apache.org/docs/0.4/project/download/).
+---
+title: "Release 0.4"
+type: release
+version: 0.4.0
+---
+
+# Apache Paimon 0.4 Available
+
+June 07, 2023
+
+We are happy to announce the availability of Paimon 0.4. This is the first 
release of the system inside the Apache Incubator and under the name Paimon. 
Releases up to 0.3 were under the name Flink Table Store, a sub-project of 
Flink where Paimon originates from.
+
+## What is Paimon?
+
+Apache Paimon(incubating) is a streaming data lake platform that supports 
high-speed data ingestion, change data tracking and efficient real-time 
analytics.
+
+Paimon offers the following core capabilities:
+
+- **Unified Batch & Streaming**: Paimon supports batch write and batch read, 
as well as streaming write changes and streaming read table changelogs.
+- **Data Lake**: As a data lake storage, Paimon has the following advantages: 
low cost, high reliability, and scalable metadata.
+- **Merge Engines**: Paimon supports rich Merge Engines. By default, the last 
entry of the primary key is reserved. You can also use the "partial-update" or 
"aggregation" engine.
+- **Changelog producer**: Paimon supports rich Changelog producers, such as 
"lookup" and "full-compaction". The correct changelog can simplify the 
construction of a streaming pipeline.
+- **Append Only Tables**: Paimon supports Append Only tables, automatically 
compact small files, and provides orderly stream reading. You can use this to 
replace message queues.
+
+## Release 0.4
+
+Paimon 0.4 includes many bug fixes and improvements that make the system more 
stable and robust.
+
+Download the release 
[here](https://paimon.apache.org/docs/0.4/project/download/).
diff --git a/community/docs/releases/release-0.5.md 
b/community/docs/releases/paimon/release-0.5.md
similarity index 98%
rename from community/docs/releases/release-0.5.md
rename to community/docs/releases/paimon/release-0.5.md
index 8c387de32b..c0482567e2 100644
--- a/community/docs/releases/release-0.5.md
+++ b/community/docs/releases/paimon/release-0.5.md
@@ -1,200 +1,200 @@
----
-title: "Release 0.5"
-type: release
-version: 0.5.0
----
-
-# Apache Paimon 0.5 Available
-
-September 06, 2023 - Jingsong Lee ([email protected])
-
-We are happy to announce the availability of Paimon 
[0.5.0-incubating](https://paimon.apache.org/docs/0.5/).
-
-Nearly 100 contributors have come to contribute release-0.5, we created 500+ 
commits together, bringing many exciting
-new features and improvements to the community. Thank you all for your joint 
efforts!
-
-Highlight:
-
-- CDC Data Ingestion into Lake has reached maturity.
-- Introduce Tags to provide immutable view to Offline data warehouse. 
-- Dynamic Bucket mode for Primary Key Table is available in production.
-- Introduce Append Only Scalable Table to replace Hive table. 
-
-## CDC Ingestion
-
-Paimon supports a variety of ways to [ingest data into 
Paimon](https://paimon.apache.org/docs/0.5/how-to/cdc-ingestion/)
-tables with schema evolution. In release 0.5, a large number of new features 
have been added:
-
-- MySQL Synchronizing Table
-  - support synchronizing shards into one Paimon table
-  - support type-mapping to make all fields to string
-- MySQL Synchronizing Database
-  - support merge multiple shards from multiple database
-  - support `--mode combined` to a unified sink to sync all tables, and sync 
newly added tables without restarting job
-- Kafka Synchronizing Table
-  - synchronize one Kafka topic’s table into one Paimon table.
-  - support Canal and OGG
-- Kafka Synchronizing Database
-  - synchronize one Kafka topic containing multiple tables or multiple topics 
containing one table each into one Paimon database.
-  - support Canal and OGG
-- MongoDB Synchronizing Collection
-  - synchronize one Collection from MongoDB into one Paimon table.
-- MongoDB Synchronizing Database 
-  - synchronize the whole MongoDB database into one Paimon database.
-
-## Primary Key Table
-
-By specific Primary Key in creating table DDL, you can get a [Primary Key 
Table](https://paimon.apache.org/docs/0.5/concepts/primary-key-table/),
-it accepts insert, update or delete records. 
-
-### Dynamic Bucket
-
-Configure `'bucket' = '-1'`, Paimon dynamically maintains the index, automatic 
expansion of the number of buckets.
-
-- Option1: `'dynamic-bucket.target-row-num'`: controls the target row number 
for one bucket.
-- Option2: `'dynamic-bucket.assigner-parallelism'`: Parallelism of assigner 
operator, controls the number of initialized bucket.
-
-Dynamic Bucket mode uses HASH index to maintain mapping from key to bucket, it 
requires more memory than fixed bucket mode.
-For performance:
-
-1. Generally speaking, there is no performance loss, but there will be some 
additional memory consumption, **100 million**
-   entries in a partition takes up **1 GB** more memory, partitions that are 
no longer active do not take up memory.
-2. For tables with low update rates, this mode is recommended to significantly 
improve performance.
-
-### Partial-Update: Sequence Group
-
-A sequence-field may not solve the disorder problem of partial-update tables 
with multiple stream updates, because
-the sequence-field may be overwritten by the latest data of another stream 
during multi-stream update. So we introduce
-sequence group mechanism for partial-update tables. It can solve:
-
-1. Disorder during multi-stream update. Each stream defines its own 
sequence-groups.
-2. A true partial-update, not just a non-null update.
-3. Accept delete records to retract partial columns.
-
-### First Row Merge Engine
-
-By specifying `'merge-engine' = 'first-row'`, users can keep the first row of 
the same primary key. It differs from the
-`deduplicate` merge engine that in the `first-row` merge engine, it will 
generate insert only changelog.
-
-This is of great help in replacing log deduplication in streaming computation.
-
-### Lookup Changelog-Producer
-
-Lookup Changelog-Producer is available in production, this can greatly reduce 
the delay for tables that need to
-generate changelogs.
-
-(Note: Please increase `'execution.checkpointing.max-concurrent-checkpoints'` 
Flink configuration, this is very
-important for performance).
-
-### Sequence Auto Padding
-
-When the record is updated or deleted, the `sequence.field` must become larger 
and cannot remain unchanged.
-For -U and +U, their sequence-fields must be different. If you cannot meet 
this requirement, Paimon provides
-option to automatically pad the sequence field for you.
-
-Configure `'sequence.auto-padding' = 'row-kind-flag'`: If you are using same 
value for -U and +U, just like "`op_ts`"
-(the time that the change was made in the database) in Mysql Binlog. It is 
recommended to use the automatic
-padding for row kind flag, which will automatically distinguish between -U 
(-D) and +U (+I).
-
-### Asynchronous Compaction
-
-Compaction is inherently asynchronous, but if you want it to be completely 
asynchronous and not blocking writing,
-expect a mode to have maximum writing throughput, the compaction can be done 
slowly and not in a hurry.
-You can use the following strategies for your table:
-
-```shell
-num-sorted-run.stop-trigger = 2147483647
-sort-spill-threshold = 10
-```
-
-This configuration will generate more files during peak write periods and 
gradually merge into optimal read
-performance during low write periods.
-
-### Avro File Format
-
-If you want to achieve ultimate compaction performance, you can consider using 
row storage file format AVRO.
-- The advantage is that you can achieve high write throughput and compaction 
performance.
-- The disadvantage is that your analysis queries will be slow, and the biggest 
problem with row storage is that it
-  does not have the query projection. For example, if the table have 100 
columns but only query a few columns, the
-  IO of row storage cannot be ignored. Additionally, compression efficiency 
will decrease and storage costs will
-  increase.
-
-```shell
-file.format = avro
-metadata.stats-mode = none
-```
-
-If you don't want to modify all files to Avro format, at least you can 
consider modifying the files in the previous
-layers to Avro format. You can use `'file.format.per.level' = '0:avro,1:avro'` 
to specify the files in the first two
-layers to be in Avro format.
-
-## Append Only Table
-
-### Append Only Scalable Table
-
-By defining `'bucket' = '-1'` to a non-pk table, you can assign an [Append 
Only Scalable 
Table](https://paimon.apache.org/docs/0.5/concepts/append-only-table/#append-for-scalable-table).
-In this mode, the table doesn't have the concept of bucket anymore, read and 
write are concurrent. We regard this table
-as a batch off-line table(although we can stream read and write still).
-
-Using this mode, you can replace your Hive table to lake table.
-
-We have auto small file compaction for this mode by default. And you can use 
`Sort Compact` action to sort whole partition,
-using zorder sorter, this can greatly speed up data skipping when querying.
-
-## Manage Tags
-
-Paimon's snapshots can provide an easy way to query historical data. But in 
most scenarios, a job will generate too many
-snapshots and table will expire old snapshots according to table 
configuration. Snapshot expiration will also delete old
-data files, and the historical data of expired snapshots cannot be queried 
anymore.
-
-To solve this problem, you can create a 
[Tag](https://paimon.apache.org/docs/0.5/maintenance/manage-tags/) based on a
-snapshot. The tag will maintain the manifests and data files of the snapshot. 
A typical usage is creating tags daily,
-then you can maintain the historical data of each day for batch reading.
-
-Paimon supports automatic creation of tags in writing job. You can use 
`'tag.automatic-creation'`to create tags automatically.
-
-And you can query the incremental data of Tags (or snapshots) too, both Flink 
and Spark support incremental queries. 
-
-## Engines
-
-### Flink
-
-After Flink released 
[1.17](https://flink.apache.org/2023/03/23/announcing-the-release-of-apache-flink-1.17/),
 Paimon
-underwent very in-depth integration.
-
-- [ALTER TABLE](https://paimon.apache.org/docs/0.5/how-to/altering-tables/) 
syntax is enhanced by including the
-  ability to ADD/MODIFY/DROP columns, making it easier for users to maintain 
their table schema.
-- 
[FlinkGenericCatalog](https://paimon.apache.org/docs/0.5/engines/flink/#quick-start),
 you need to use Hive metastore. 
-  Then, you can use all the tables from Paimon, Hive, and Flink Generic Tables 
(Kafka and other tables)!
-- [Dynamic Partition 
Overwrite](https://paimon.apache.org/docs/0.5/how-to/writing-tables/#dynamic-overwrite)
 Flink’s
-  default overwrite mode is dynamic partition overwrite (that means Paimon 
only deletes the partitions appear in the
-  overwritten data). You can configure dynamic-partition-overwrite to change 
it to static overwritten.
-- [Sync Partitions into Hive 
Metastore](https://paimon.apache.org/docs/0.5/how-to/creating-catalogs/#synchronizing-partitions-into-hive-metastore)
-  By default, Paimon does not synchronize newly created partitions into Hive 
metastore. If you want to see a partitioned
-  table in Hive and also synchronize newly created partitions into Hive 
metastore, please set the table property `metastore.partitioned-table` to true.
-- [Retry Lookup Join](https://paimon.apache.org/docs/0.5/how-to/lookup-joins/) 
support Retry Lookup and Async Retry Lookup.
-
-### Spark
-
-Spark is another computing engine that Paimon has in-depth integration and has 
taken a big step forward at 0.5, including the following features:
-
-- [INSERT 
OVERWRITE](https://paimon.apache.org/docs/0.5/how-to/writing-tables/#overwriting-the-whole-table)
 insert ovewrite
-  partition, Spark’s default overwrite mode is static partition overwrite, you 
can enable dynamic overwritten too.
-- Partition Management: Support `DROP PARTITION`, `SHOW PARTITIONS`.
-- Supports saving a DataFrame to a paimon location.
-- Schema merging write: You can set `write.merge-schema` to true to write with 
schema merging.
-- Streaming sink: You can use Spark streaming `foreachBatch` API to streaming 
sink to Paimon.
-
-## Download
-
-Download the release 
[here](https://paimon.apache.org/docs/0.5/project/download/).
-
-## What's next?
-
-Paimon will be committed to solving the following scenarios for a long time:
-
-1. Acceleration of CDC data into the lake: real-time writing, real-time query, 
and offline immutable partition view by using Tags.
-2. Enrich Merge Engines to improve streaming computation: Partial-Update 
table, Aggregation table, First Row table.
-3. Changelog Streaming read, build incremental stream processing based on lake 
storage.
-4. Append mode accelerates Hive offline tables, writes in real time and brings 
query acceleration after sorting.
-5. Append mode replaces some message queue scenarios, stream reads in input 
order, and without data TTL.
+---
+title: "Release 0.5"
+type: release
+version: 0.5.0
+---
+
+# Apache Paimon 0.5 Available
+
+September 06, 2023 - Jingsong Lee ([email protected])
+
+We are happy to announce the availability of Paimon 
[0.5.0-incubating](https://paimon.apache.org/docs/0.5/).
+
+Nearly 100 contributors have come to contribute release-0.5, we created 500+ 
commits together, bringing many exciting
+new features and improvements to the community. Thank you all for your joint 
efforts!
+
+Highlight:
+
+- CDC Data Ingestion into Lake has reached maturity.
+- Introduce Tags to provide immutable view to Offline data warehouse. 
+- Dynamic Bucket mode for Primary Key Table is available in production.
+- Introduce Append Only Scalable Table to replace Hive table. 
+
+## CDC Ingestion
+
+Paimon supports a variety of ways to [ingest data into 
Paimon](https://paimon.apache.org/docs/0.5/how-to/cdc-ingestion/)
+tables with schema evolution. In release 0.5, a large number of new features 
have been added:
+
+- MySQL Synchronizing Table
+  - support synchronizing shards into one Paimon table
+  - support type-mapping to make all fields to string
+- MySQL Synchronizing Database
+  - support merge multiple shards from multiple database
+  - support `--mode combined` to a unified sink to sync all tables, and sync 
newly added tables without restarting job
+- Kafka Synchronizing Table
+  - synchronize one Kafka topic’s table into one Paimon table.
+  - support Canal and OGG
+- Kafka Synchronizing Database
+  - synchronize one Kafka topic containing multiple tables or multiple topics 
containing one table each into one Paimon database.
+  - support Canal and OGG
+- MongoDB Synchronizing Collection
+  - synchronize one Collection from MongoDB into one Paimon table.
+- MongoDB Synchronizing Database 
+  - synchronize the whole MongoDB database into one Paimon database.
+
+## Primary Key Table
+
+By specific Primary Key in creating table DDL, you can get a [Primary Key 
Table](https://paimon.apache.org/docs/0.5/concepts/primary-key-table/),
+it accepts insert, update or delete records. 
+
+### Dynamic Bucket
+
+Configure `'bucket' = '-1'`, Paimon dynamically maintains the index, automatic 
expansion of the number of buckets.
+
+- Option1: `'dynamic-bucket.target-row-num'`: controls the target row number 
for one bucket.
+- Option2: `'dynamic-bucket.assigner-parallelism'`: Parallelism of assigner 
operator, controls the number of initialized bucket.
+
+Dynamic Bucket mode uses HASH index to maintain mapping from key to bucket, it 
requires more memory than fixed bucket mode.
+For performance:
+
+1. Generally speaking, there is no performance loss, but there will be some 
additional memory consumption, **100 million**
+   entries in a partition takes up **1 GB** more memory, partitions that are 
no longer active do not take up memory.
+2. For tables with low update rates, this mode is recommended to significantly 
improve performance.
+
+### Partial-Update: Sequence Group
+
+A sequence-field may not solve the disorder problem of partial-update tables 
with multiple stream updates, because
+the sequence-field may be overwritten by the latest data of another stream 
during multi-stream update. So we introduce
+sequence group mechanism for partial-update tables. It can solve:
+
+1. Disorder during multi-stream update. Each stream defines its own 
sequence-groups.
+2. A true partial-update, not just a non-null update.
+3. Accept delete records to retract partial columns.
+
+### First Row Merge Engine
+
+By specifying `'merge-engine' = 'first-row'`, users can keep the first row of 
the same primary key. It differs from the
+`deduplicate` merge engine that in the `first-row` merge engine, it will 
generate insert only changelog.
+
+This is of great help in replacing log deduplication in streaming computation.
+
+### Lookup Changelog-Producer
+
+Lookup Changelog-Producer is available in production, this can greatly reduce 
the delay for tables that need to
+generate changelogs.
+
+(Note: Please increase `'execution.checkpointing.max-concurrent-checkpoints'` 
Flink configuration, this is very
+important for performance).
+
+### Sequence Auto Padding
+
+When the record is updated or deleted, the `sequence.field` must become larger 
and cannot remain unchanged.
+For -U and +U, their sequence-fields must be different. If you cannot meet 
this requirement, Paimon provides
+option to automatically pad the sequence field for you.
+
+Configure `'sequence.auto-padding' = 'row-kind-flag'`: If you are using same 
value for -U and +U, just like "`op_ts`"
+(the time that the change was made in the database) in Mysql Binlog. It is 
recommended to use the automatic
+padding for row kind flag, which will automatically distinguish between -U 
(-D) and +U (+I).
+
+### Asynchronous Compaction
+
+Compaction is inherently asynchronous, but if you want it to be completely 
asynchronous and not blocking writing,
+expect a mode to have maximum writing throughput, the compaction can be done 
slowly and not in a hurry.
+You can use the following strategies for your table:
+
+```shell
+num-sorted-run.stop-trigger = 2147483647
+sort-spill-threshold = 10
+```
+
+This configuration will generate more files during peak write periods and 
gradually merge into optimal read
+performance during low write periods.
+
+### Avro File Format
+
+If you want to achieve ultimate compaction performance, you can consider using 
row storage file format AVRO.
+- The advantage is that you can achieve high write throughput and compaction 
performance.
+- The disadvantage is that your analysis queries will be slow, and the biggest 
problem with row storage is that it
+  does not have the query projection. For example, if the table have 100 
columns but only query a few columns, the
+  IO of row storage cannot be ignored. Additionally, compression efficiency 
will decrease and storage costs will
+  increase.
+
+```shell
+file.format = avro
+metadata.stats-mode = none
+```
+
+If you don't want to modify all files to Avro format, at least you can 
consider modifying the files in the previous
+layers to Avro format. You can use `'file.format.per.level' = '0:avro,1:avro'` 
to specify the files in the first two
+layers to be in Avro format.
+
+## Append Only Table
+
+### Append Only Scalable Table
+
+By defining `'bucket' = '-1'` to a non-pk table, you can assign an [Append 
Only Scalable 
Table](https://paimon.apache.org/docs/0.5/concepts/append-only-table/#append-for-scalable-table).
+In this mode, the table doesn't have the concept of bucket anymore, read and 
write are concurrent. We regard this table
+as a batch off-line table(although we can stream read and write still).
+
+Using this mode, you can replace your Hive table to lake table.
+
+We have auto small file compaction for this mode by default. And you can use 
`Sort Compact` action to sort whole partition,
+using zorder sorter, this can greatly speed up data skipping when querying.
+
+## Manage Tags
+
+Paimon's snapshots can provide an easy way to query historical data. But in 
most scenarios, a job will generate too many
+snapshots and table will expire old snapshots according to table 
configuration. Snapshot expiration will also delete old
+data files, and the historical data of expired snapshots cannot be queried 
anymore.
+
+To solve this problem, you can create a 
[Tag](https://paimon.apache.org/docs/0.5/maintenance/manage-tags/) based on a
+snapshot. The tag will maintain the manifests and data files of the snapshot. 
A typical usage is creating tags daily,
+then you can maintain the historical data of each day for batch reading.
+
+Paimon supports automatic creation of tags in writing job. You can use 
`'tag.automatic-creation'`to create tags automatically.
+
+And you can query the incremental data of Tags (or snapshots) too, both Flink 
and Spark support incremental queries. 
+
+## Engines
+
+### Flink
+
+After Flink released 
[1.17](https://flink.apache.org/2023/03/23/announcing-the-release-of-apache-flink-1.17/),
 Paimon
+underwent very in-depth integration.
+
+- [ALTER TABLE](https://paimon.apache.org/docs/0.5/how-to/altering-tables/) 
syntax is enhanced by including the
+  ability to ADD/MODIFY/DROP columns, making it easier for users to maintain 
their table schema.
+- 
[FlinkGenericCatalog](https://paimon.apache.org/docs/0.5/engines/flink/#quick-start),
 you need to use Hive metastore. 
+  Then, you can use all the tables from Paimon, Hive, and Flink Generic Tables 
(Kafka and other tables)!
+- [Dynamic Partition 
Overwrite](https://paimon.apache.org/docs/0.5/how-to/writing-tables/#dynamic-overwrite)
 Flink’s
+  default overwrite mode is dynamic partition overwrite (that means Paimon 
only deletes the partitions appear in the
+  overwritten data). You can configure dynamic-partition-overwrite to change 
it to static overwritten.
+- [Sync Partitions into Hive 
Metastore](https://paimon.apache.org/docs/0.5/how-to/creating-catalogs/#synchronizing-partitions-into-hive-metastore)
+  By default, Paimon does not synchronize newly created partitions into Hive 
metastore. If you want to see a partitioned
+  table in Hive and also synchronize newly created partitions into Hive 
metastore, please set the table property `metastore.partitioned-table` to true.
+- [Retry Lookup Join](https://paimon.apache.org/docs/0.5/how-to/lookup-joins/) 
support Retry Lookup and Async Retry Lookup.
+
+### Spark
+
+Spark is another computing engine that Paimon has in-depth integration and has 
taken a big step forward at 0.5, including the following features:
+
+- [INSERT 
OVERWRITE](https://paimon.apache.org/docs/0.5/how-to/writing-tables/#overwriting-the-whole-table)
 insert ovewrite
+  partition, Spark’s default overwrite mode is static partition overwrite, you 
can enable dynamic overwritten too.
+- Partition Management: Support `DROP PARTITION`, `SHOW PARTITIONS`.
+- Supports saving a DataFrame to a paimon location.
+- Schema merging write: You can set `write.merge-schema` to true to write with 
schema merging.
+- Streaming sink: You can use Spark streaming `foreachBatch` API to streaming 
sink to Paimon.
+
+## Download
+
+Download the release 
[here](https://paimon.apache.org/docs/0.5/project/download/).
+
+## What's next?
+
+Paimon will be committed to solving the following scenarios for a long time:
+
+1. Acceleration of CDC data into the lake: real-time writing, real-time query, 
and offline immutable partition view by using Tags.
+2. Enrich Merge Engines to improve streaming computation: Partial-Update 
table, Aggregation table, First Row table.
+3. Changelog Streaming read, build incremental stream processing based on lake 
storage.
+4. Append mode accelerates Hive offline tables, writes in real time and brings 
query acceleration after sorting.
+5. Append mode replaces some message queue scenarios, stream reads in input 
order, and without data TTL.
diff --git a/community/docs/releases/release-0.6.md 
b/community/docs/releases/paimon/release-0.6.md
similarity index 97%
rename from community/docs/releases/release-0.6.md
rename to community/docs/releases/paimon/release-0.6.md
index 5a03d6c19b..6286a78778 100644
--- a/community/docs/releases/release-0.6.md
+++ b/community/docs/releases/paimon/release-0.6.md
@@ -1,204 +1,204 @@
----
-title: "Release 0.6"
-type: release
-version: 0.6.0
----
-
-# Apache Paimon 0.6 Available
-
-December 13, 2023 - Paimon Community ([email protected])
-
-Apache Paimon PPMC has officially released Apache Paimon 0.6.0-incubating 
version. A total of 58 people contributed to
-this version and completed over 400 Commits. Thank you to all contributors for 
their support!
-
-Some outstanding developments are:
-
-1. Flink Paimon CDC almost supports all mainstream data ingestion currently 
available.
-2. Flink 1.18 and Paimon supports CALL procedure, this will make table 
management easier.
-3. Cross partition update is available for production!
-4. Read-optimized table is introduced to enhance query performance.
-5. Append scalable mode is available for production!
-6. Paimon Presto module is available for production!
-7. Metrics system is integrated to Flink Metrics.
-8. Spark Paimon has made tremendous progress.
-
-For details, please refer to the following text.
-
-## Flink
-
-### Paimon CDC
-
-Paimon CDC integrates Flink CDC, Kafka, Pulsar, etc., and provides 
comprehensive support in version 0.6:
-
-1. Kafka CDC supports formats: Canal Json, Debezium Json, Maxwell and OGG.
-2. Pulsar CDC is added, both Table Sync and Database Sync.
-3. Mongo CDC is available for production!
-
-### Flink Batch Source
-
-By default, the parallelism of batch reads is the same as the number of 
splits, while the parallelism of stream reads
-is the same as the number of buckets, but not greater than 
scan.infer-parallelism.max (Default is 1024).
-
-### Flink Streaming Source
-
-Consumer-id is available for production!
-
-You can specify the consumer-id when streaming read table record consuming 
snapshot id in Paimon, the newly started 
-job can continue to consume from the previous progress without resuming from 
the state. You can also set consumer.mode
-to at-least-once to get better checkpoint time. 
-
-### Flink Time Travel
-
-Flink 1.18 SQL supports Time Travel Query (You can also use dynamic option):
-
-```sql
-SELECT * FROM t FOR SYSTEM_TIME AS OF TIMESTAMP '2023-01-01 00:00:00';
-```
-
-### Flink Call Procedures
-
-Flink 1.18 SQL supports Call Procedures:
-
-| Procedure Name |    Example    |
-|:------:|:-------------:|
-| compact  |  CALL sys.compact('default.T', 'p=0', 'zorder', 'a,b', 
'sink.parallelism=4')  |
-| compact_database  |  CALL sys.compact_database('db1|db2', 'combined', 
'table_.*', 'ignore', 'sink.parallelism=4')   |
-| create_tag  |   CALL sys.create_tag('default.T', 'my_tag', 10)   |
-| delete_tag  |   CALL sys.delete_tag('default.T', 'my_tag')   |
-| merge_into  |   CALL sys.merge_into('default.T', '', '', 'default.S', 
'T.id=S.order_id', '', 'price=T.price+20', '', '*')   |
-| remove_orphan_files  |   CALL remove_orphan_files('default.T', '2023-10-31 
12:00:00')   |
-| reset_consumer  |   CALL sys.reset_consumer('default.T', 'myid', 10)  |
-| rollback_to  |   CALL sys.rollback_to('default.T', 10)   |
-
-Flink 1.19 will support Named Arguments which will make it easier to use when 
there are multiple arguments.
-
-### Committer Improvement
-
-The Committee is responsible for submitting metadata, and sometimes it may 
have bottlenecks that can lead to 
-backpressure operations. In 0.6, we have the following optimizations:
-
-1. By default, paimon will delete expired snapshots synchronously. Users can 
use asynchronous expiration mode by 
-   setting snapshot.expire.execution-mode to async to improve performance.
-2. You can use fine-grained-resource-management of Flink to increase committer 
heap memory and cpu only.
-
-## Primary Key Table
-
-### Cross Partition Update
-
-Cross partition update is available for production!
-
-Currently Flink batch & streaming writes are supported and has been applied by 
enterprises to production environments!
-How to use Cross partition update:
-
-1. Primary keys not contain all partition fields.
-2. Use dynamic bucket mode, which means bucket is -1.
-
-This mode directly maintains the mapping of keys to partition and bucket, uses 
local disks, and initializes indexes by
-reading all existing keys in the table when starting write job. Although 
maintaining the index is necessary, this mode
-also maintains high throughput performance. Please try it out.
-
-### Read Optimized
-
-For Primary Key Table, it's a 'MergeOnRead' technology. When reading data, 
multiple layers of LSM data are merged, and
-the number of parallelism will be limited by the number of buckets. If you 
want to query fast enough in certain scenarios,
-but can only find older data, you can query from read-optimized table: SELECT 
* FROM T$ro.
-
-But the freshness of the data cannot be guaranteed, you can configure 
'full-compaction.delta-commits' when writing data
-to ensure that data with a determined latency is read.
-
-StarRocks and other OLAP systems will release a version to greatly enhance 
query performance for read-optimized tables
-based on Paimon 0.6.
-
-### Partial Update
-
-In 0.6, you can define aggregation functions for the partial-update merge 
engine with sequence group. This allows you
-to perform special aggregations on certain fields under certain conditions, 
such as count, sum, etc.
-
-### Compaction
-
-We have introduced some asynchronous techniques to further improve the 
performance of Compaction! 20%+
-
-And 0.6 introduces the database compaction, you can run the following command 
to submit a compaction job for multiple
-database. If you submit a streaming job, the job will continuously monitor new 
changes to the table and perform
-compactions as needed.
-
-## Append Table
-
-Append scalable mode is available for production!
-
-By defining 'bucket' = '-1' to non-primary table, you can assign an append 
scalable mode for the table. This type of
-table is an upgrade to Hive format. You can use it:
-
-1. Spark, Flink Batch Read & Write, including INSERT OVERWRITE support.
-2. Flink, Spark Streaming Read & Write, Flink will do small files compaction.
-3. You can sort (z-order) this table, which will greatly accelerate query 
performance, especially when there are filtering conditions related to sorting 
keys.
-
-You can set write-buffer-for-append option for append-only table, to apply 
situations where a large number of partitions
-are streaming written simultaneously.
-
-0.6 also introduce Hive Table Migration, Apache Hive supports ORC, Parquet 
file formats that could be migrated to Paimon.
-When migrating data to a paimon table, the origin table will be permanently 
disappeared. So please back up your data if
-you still need the original table. The migrated table will be append table. 
You can use Flink Spark CALL procedure to
-migrate Hive table.
-
-StarRocks and other OLAP systems will release a version to greatly enhance 
query performance for append tables based on Paimon 0.6.
-
-## Tag Management
-
-### Upsert To Partitioned
-
-The Tag will maintain the manifests and data files of the snapshot. Offline 
data warehouses require an immutable view
-every day to ensure the idempotence of calculations. So we created a Tag 
mechanism to output these views.
-
-However, the traditional use of Hive data warehouses is more accustomed to 
using partitions to specify the query's Tag,
-and is more accustomed to using Hive computing engines.
-
-So, we introduce metastore.tag-to-partition and 
metastore.tag-to-partition.preview to mapping a non-partitioned primary
-key table to the partition table in Hive metastore, and mapping the partition 
field to the name of the Tag to be fully
-compatible with Hive.
-
-### Tag with Flink Savepoint
-
-You cannot recover a write job from an old Flink savepoint, which may cause 
issues with the Paimon table. In 0.6, we
-avoided this situation where an exception is thrown when data anomalies occur, 
causing the job to fail to start.
-
-If you want to recover from the old savepoint, we recommend setting 
sink.savepoint.auto-tag to true to enable the
-feature of automatically creating tags for Flink savepoint.
-
-## Formats
-
-0.6 upgrates ORC version to 1.8.3, and Parquet version to 1.13.1. ORC natively 
supports ZSTD in this version, which
-is a compression algorithm with a higher compression rate. We recommend using 
it when high compression rates are needed.
-
-## Metrics System
-
-In 0.6, Paimon has built a metrics system to measure the behaviours of reading 
and writing, Paimon has supported
-built-in metrics to measure operations of commits, scans, writes and 
compactions, which can be bridged to computing
-engine like Flink. The most important for streaming read is 
currentFetchEventTimeLag.
-
-## Paimon Spark
-
-1. Support Spark 3.5
-2. Structured Streaming: Supports serving as a Streaming Source, supports 
source side traffic control through custom read triggers, and supports stream 
read changelog
-3. Row Level Operation: DELETE optimization, supporting UPDATE and MERGE INTO
-4. Call Procedure: Add compact and migrate_table, migrate_file, 
remove_orphan_files, create_tag, delete_tag, rollback
-5. Query optimization: Push down filter optimization, support for Push down 
limit, and runtime filter (DPP)
-6. Other: Truncate Table optimization, support for CTAS, support for Truncate 
Partition
-
-## Paimon Trino
-
-The Paimon Trino module mainly performs the following tasks to accelerate 
queries:
-
-1. Optimize the issue of converting pages to avoid memory overflow caused by 
large pages
-2. Implemented Limit Pushdown and can combine partition pruning
-
-## Paimon Presto
-
-The Paimon Presto module is available for production! The following 
capabilities have been added:
-
-1. Implement Filter Pushdown, which allows Paimon Presto to be available for 
production
-2. Use the Inject mode, which allows Paimon Catalog to reside in the process 
and improve query speed
-
-## What's next?
-
-Report your requirements!
+---
+title: "Release 0.6"
+type: release
+version: 0.6.0
+---
+
+# Apache Paimon 0.6 Available
+
+December 13, 2023 - Paimon Community ([email protected])
+
+Apache Paimon PPMC has officially released Apache Paimon 0.6.0-incubating 
version. A total of 58 people contributed to
+this version and completed over 400 Commits. Thank you to all contributors for 
their support!
+
+Some outstanding developments are:
+
+1. Flink Paimon CDC almost supports all mainstream data ingestion currently 
available.
+2. Flink 1.18 and Paimon supports CALL procedure, this will make table 
management easier.
+3. Cross partition update is available for production!
+4. Read-optimized table is introduced to enhance query performance.
+5. Append scalable mode is available for production!
+6. Paimon Presto module is available for production!
+7. Metrics system is integrated to Flink Metrics.
+8. Spark Paimon has made tremendous progress.
+
+For details, please refer to the following text.
+
+## Flink
+
+### Paimon CDC
+
+Paimon CDC integrates Flink CDC, Kafka, Pulsar, etc., and provides 
comprehensive support in version 0.6:
+
+1. Kafka CDC supports formats: Canal Json, Debezium Json, Maxwell and OGG.
+2. Pulsar CDC is added, both Table Sync and Database Sync.
+3. Mongo CDC is available for production!
+
+### Flink Batch Source
+
+By default, the parallelism of batch reads is the same as the number of 
splits, while the parallelism of stream reads
+is the same as the number of buckets, but not greater than 
scan.infer-parallelism.max (Default is 1024).
+
+### Flink Streaming Source
+
+Consumer-id is available for production!
+
+You can specify the consumer-id when streaming read table record consuming 
snapshot id in Paimon, the newly started 
+job can continue to consume from the previous progress without resuming from 
the state. You can also set consumer.mode
+to at-least-once to get better checkpoint time. 
+
+### Flink Time Travel
+
+Flink 1.18 SQL supports Time Travel Query (You can also use dynamic option):
+
+```sql
+SELECT * FROM t FOR SYSTEM_TIME AS OF TIMESTAMP '2023-01-01 00:00:00';
+```
+
+### Flink Call Procedures
+
+Flink 1.18 SQL supports Call Procedures:
+
+| Procedure Name |    Example    |
+|:------:|:-------------:|
+| compact  |  CALL sys.compact('default.T', 'p=0', 'zorder', 'a,b', 
'sink.parallelism=4')  |
+| compact_database  |  CALL sys.compact_database('db1|db2', 'combined', 
'table_.*', 'ignore', 'sink.parallelism=4')   |
+| create_tag  |   CALL sys.create_tag('default.T', 'my_tag', 10)   |
+| delete_tag  |   CALL sys.delete_tag('default.T', 'my_tag')   |
+| merge_into  |   CALL sys.merge_into('default.T', '', '', 'default.S', 
'T.id=S.order_id', '', 'price=T.price+20', '', '*')   |
+| remove_orphan_files  |   CALL remove_orphan_files('default.T', '2023-10-31 
12:00:00')   |
+| reset_consumer  |   CALL sys.reset_consumer('default.T', 'myid', 10)  |
+| rollback_to  |   CALL sys.rollback_to('default.T', 10)   |
+
+Flink 1.19 will support Named Arguments which will make it easier to use when 
there are multiple arguments.
+
+### Committer Improvement
+
+The Committee is responsible for submitting metadata, and sometimes it may 
have bottlenecks that can lead to 
+backpressure operations. In 0.6, we have the following optimizations:
+
+1. By default, paimon will delete expired snapshots synchronously. Users can 
use asynchronous expiration mode by 
+   setting snapshot.expire.execution-mode to async to improve performance.
+2. You can use fine-grained-resource-management of Flink to increase committer 
heap memory and cpu only.
+
+## Primary Key Table
+
+### Cross Partition Update
+
+Cross partition update is available for production!
+
+Currently Flink batch & streaming writes are supported and has been applied by 
enterprises to production environments!
+How to use Cross partition update:
+
+1. Primary keys not contain all partition fields.
+2. Use dynamic bucket mode, which means bucket is -1.
+
+This mode directly maintains the mapping of keys to partition and bucket, uses 
local disks, and initializes indexes by
+reading all existing keys in the table when starting write job. Although 
maintaining the index is necessary, this mode
+also maintains high throughput performance. Please try it out.
+
+### Read Optimized
+
+For Primary Key Table, it's a 'MergeOnRead' technology. When reading data, 
multiple layers of LSM data are merged, and
+the number of parallelism will be limited by the number of buckets. If you 
want to query fast enough in certain scenarios,
+but can only find older data, you can query from read-optimized table: SELECT 
* FROM T$ro.
+
+But the freshness of the data cannot be guaranteed, you can configure 
'full-compaction.delta-commits' when writing data
+to ensure that data with a determined latency is read.
+
+StarRocks and other OLAP systems will release a version to greatly enhance 
query performance for read-optimized tables
+based on Paimon 0.6.
+
+### Partial Update
+
+In 0.6, you can define aggregation functions for the partial-update merge 
engine with sequence group. This allows you
+to perform special aggregations on certain fields under certain conditions, 
such as count, sum, etc.
+
+### Compaction
+
+We have introduced some asynchronous techniques to further improve the 
performance of Compaction! 20%+
+
+And 0.6 introduces the database compaction, you can run the following command 
to submit a compaction job for multiple
+database. If you submit a streaming job, the job will continuously monitor new 
changes to the table and perform
+compactions as needed.
+
+## Append Table
+
+Append scalable mode is available for production!
+
+By defining 'bucket' = '-1' to non-primary table, you can assign an append 
scalable mode for the table. This type of
+table is an upgrade to Hive format. You can use it:
+
+1. Spark, Flink Batch Read & Write, including INSERT OVERWRITE support.
+2. Flink, Spark Streaming Read & Write, Flink will do small files compaction.
+3. You can sort (z-order) this table, which will greatly accelerate query 
performance, especially when there are filtering conditions related to sorting 
keys.
+
+You can set write-buffer-for-append option for append-only table, to apply 
situations where a large number of partitions
+are streaming written simultaneously.
+
+0.6 also introduce Hive Table Migration, Apache Hive supports ORC, Parquet 
file formats that could be migrated to Paimon.
+When migrating data to a paimon table, the origin table will be permanently 
disappeared. So please back up your data if
+you still need the original table. The migrated table will be append table. 
You can use Flink Spark CALL procedure to
+migrate Hive table.
+
+StarRocks and other OLAP systems will release a version to greatly enhance 
query performance for append tables based on Paimon 0.6.
+
+## Tag Management
+
+### Upsert To Partitioned
+
+The Tag will maintain the manifests and data files of the snapshot. Offline 
data warehouses require an immutable view
+every day to ensure the idempotence of calculations. So we created a Tag 
mechanism to output these views.
+
+However, the traditional use of Hive data warehouses is more accustomed to 
using partitions to specify the query's Tag,
+and is more accustomed to using Hive computing engines.
+
+So, we introduce metastore.tag-to-partition and 
metastore.tag-to-partition.preview to mapping a non-partitioned primary
+key table to the partition table in Hive metastore, and mapping the partition 
field to the name of the Tag to be fully
+compatible with Hive.
+
+### Tag with Flink Savepoint
+
+You cannot recover a write job from an old Flink savepoint, which may cause 
issues with the Paimon table. In 0.6, we
+avoided this situation where an exception is thrown when data anomalies occur, 
causing the job to fail to start.
+
+If you want to recover from the old savepoint, we recommend setting 
sink.savepoint.auto-tag to true to enable the
+feature of automatically creating tags for Flink savepoint.
+
+## Formats
+
+0.6 upgrates ORC version to 1.8.3, and Parquet version to 1.13.1. ORC natively 
supports ZSTD in this version, which
+is a compression algorithm with a higher compression rate. We recommend using 
it when high compression rates are needed.
+
+## Metrics System
+
+In 0.6, Paimon has built a metrics system to measure the behaviours of reading 
and writing, Paimon has supported
+built-in metrics to measure operations of commits, scans, writes and 
compactions, which can be bridged to computing
+engine like Flink. The most important for streaming read is 
currentFetchEventTimeLag.
+
+## Paimon Spark
+
+1. Support Spark 3.5
+2. Structured Streaming: Supports serving as a Streaming Source, supports 
source side traffic control through custom read triggers, and supports stream 
read changelog
+3. Row Level Operation: DELETE optimization, supporting UPDATE and MERGE INTO
+4. Call Procedure: Add compact and migrate_table, migrate_file, 
remove_orphan_files, create_tag, delete_tag, rollback
+5. Query optimization: Push down filter optimization, support for Push down 
limit, and runtime filter (DPP)
+6. Other: Truncate Table optimization, support for CTAS, support for Truncate 
Partition
+
+## Paimon Trino
+
+The Paimon Trino module mainly performs the following tasks to accelerate 
queries:
+
+1. Optimize the issue of converting pages to avoid memory overflow caused by 
large pages
+2. Implemented Limit Pushdown and can combine partition pruning
+
+## Paimon Presto
+
+The Paimon Presto module is available for production! The following 
capabilities have been added:
+
+1. Implement Filter Pushdown, which allows Paimon Presto to be available for 
production
+2. Use the Inject mode, which allows Paimon Catalog to reside in the process 
and improve query speed
+
+## What's next?
+
+Report your requirements!
diff --git a/community/docs/releases/release-0.7.md 
b/community/docs/releases/paimon/release-0.7.md
similarity index 100%
rename from community/docs/releases/release-0.7.md
rename to community/docs/releases/paimon/release-0.7.md
diff --git a/community/docs/releases/release-0.8.1.md 
b/community/docs/releases/paimon/release-0.8.1.md
similarity index 100%
rename from community/docs/releases/release-0.8.1.md
rename to community/docs/releases/paimon/release-0.8.1.md
diff --git a/community/docs/releases/release-0.8.2.md 
b/community/docs/releases/paimon/release-0.8.2.md
similarity index 100%
rename from community/docs/releases/release-0.8.2.md
rename to community/docs/releases/paimon/release-0.8.2.md
diff --git a/community/docs/releases/release-0.8.md 
b/community/docs/releases/paimon/release-0.8.md
similarity index 100%
rename from community/docs/releases/release-0.8.md
rename to community/docs/releases/paimon/release-0.8.md
diff --git a/community/docs/releases/release-0.9.md 
b/community/docs/releases/paimon/release-0.9.md
similarity index 100%
rename from community/docs/releases/release-0.9.md
rename to community/docs/releases/paimon/release-0.9.md
diff --git a/community/docs/releases/pypaimon-release-0.2.0.md 
b/community/docs/releases/pypaimon/release-pypaimon-0.2.0.md
similarity index 100%
rename from community/docs/releases/pypaimon-release-0.2.0.md
rename to community/docs/releases/pypaimon/release-pypaimon-0.2.0.md

(paimon-website) 02/02: try fix

Reply via email to