This is an automated email from the ASF dual-hosted git repository. yuzelin pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/paimon-website.git
commit a22411f01c98667e3961e7750a89dfcfcc4adf14 Author: yuzelin <[email protected]> AuthorDate: Thu Jul 17 11:07:32 2025 +0800 feat(optional): message --- community/docs/releases/release-1.2.md | 46 +++++++++++++++++----------------- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/community/docs/releases/release-1.2.md b/community/docs/releases/release-1.2.md index 70083735e..47a4a8bd8 100644 --- a/community/docs/releases/release-1.2.md +++ b/community/docs/releases/release-1.2.md @@ -19,7 +19,7 @@ Notable changes in this version are: 1. Polished Iceberg compatibility, more silky integration with Iceberg. 2. Introduce Function to enhance data processing and query capabilities. 3. REST Catalog capability further enhanced. -4. Postpone (adaptive bucket) Table capability enhancement and bug fix. +4. Postpone bucket (adaptive bucket) Table capability enhancement and bug fix. 5. Support for migrating Hudi tables to Paimon tables. 6. Continue to enhance the integration with Flink/Spark/Hive, add new features and fix bugs. 7. Make multiple optimizations for memory usage to avoid potential OOM issues. @@ -29,13 +29,13 @@ Notable changes in this version are: In this version, Iceberg compatibility adds the following capabilities: 1. Deletion vector compatibility: The file format of Iceberg's deletion vector is different from Paimon's, so we -introduce a new deletion vector file format. You can now set the 'delete-vectors.bitmap64' = 'true' to produce -the Iceberg-compatible delete vector files. +introduce a new deletion vector file format. You can set `delete-vectors.bitmap64 = true` to produce the +Iceberg-compatible delete vector files. -2. Flexible storage location setting: When 'metadata.iceberg.storage' = 'table-location' is set, the Iceberg metadata -is stored in the table directory, but won't be registered in Hive/AWS Glue. Therefore, a new parameter -'metadata.iceberg.storage-location' is introduced. When the parameter is set to 'table-location', the Iceberg metadata -is stored in the table directory and also registered in Hive/AWS Glue. In this way, you can deployment the data flexibly. +2. Flexible storage location setting: When `metadata.iceberg.storage = table-location` is set, the Iceberg metadata +is stored in the table directory, but won't be registered in Hive/AWS Glue. Therefore, a new option +`metadata.iceberg.storage-location` is introduced. When it is set to `table-location`, the Iceberg metadata is stored +in the table directory and also registered in Hive/AWS Glue. In this way, you can deployment the data flexibly. 3. Tag support: Now, when a Paimon Tag is created or deleted, the corresponding Iceberg metadata is also changed, and you can access the Tag through Iceberg. @@ -55,7 +55,7 @@ Create: CALL sys.create_function( `function` => 'my_db.area_func', `inputParams` => '[{"id": 0, "name":"length", "type":"INT"}, {"id": 1, "name":"width", "type":"INT"}]', - `returnParams` => '[{"id": 0, "name":"area", "type":"BIGINT"}]', + `returnParams` => '[{"id": 0, "name":"area", "type":"BIGINT"}]', `deterministic` => true, `comment` => 'comment', `options` => 'k1=v1,k2=v2' @@ -98,13 +98,13 @@ This release continues to enhance the REST Catalog, providing the following opti This version continues to improve the Postpone bucket table capabilities, such as: 1. Support deletion vector. -2. Support 'partition.sink-strategy' option which improves write performance. +2. Support `partition.sink-strategy` option which improves write performance. 3. Paimon CDC supports Postpone bucket table. 4. Fix the problem that lookup join with a Postpone bucket table as dimension table produces wrong result. 5. Fix possible data error problem of Postpone bucket table write job when the source and sink parallelisms are not the same. -6. Fix the problem that the Postpone bucket table cannot be streaming read when 'changelog-producer' = 'none'. +6. Fix the problem that the Postpone bucket table cannot be streaming read when `changelog-producer = none`. 7. Fix possible data lost problem if the rescale and compaction jobs of one Postpone bucket table are submitted at the same time. -The 'commit.strict-mode.last-safe-snapshot' option is provided to solve it. The job will check the security of commit from the +The `commit.strict-mode.last-safe-snapshot` option is provided to solve it. The job will check the correctness of commit from the snapshot specified by the option. If the job is newly started, you can directly set it to -1. ## Hudi Migration @@ -130,7 +130,7 @@ tables registered in HMS are supported. The usage is as follows (through the Fli This version provides new features and bug fixes of the Flink/Spark/Hive connector: -1. Flink lookup join optimization: Previously, all data of the Paimon dimension table needs to be cached in taskmanager. +1. Flink lookup join optimization: Previously, all data of the Paimon dimension table needs to be cached in task manager. [FLIP-462](https://cwiki.apache.org/confluence/display/FLINK/FLIP-462+Support+Custom+Data+Distribution+for+Input+Stream+of+Lookup+Join) allows to customize the data shuffle mode of the lookup join operator. Paimon implements this optimization, allowing each subtask to load part of the dimension table data (instead of full data). In this way, the loading of the dimension table will take less time @@ -151,12 +151,12 @@ ON o.customer_id = c.id; 2. Paimon dimension table supports to be loaded in-memory cache: Previously, Paimon dimension table uses RocksDB as the cache, but its performance is not very good. Therefore, this version introduces purely in-memory cache for dimension table data (note -that it may lead to OOM). You can set 'lookup.cache' = 'memory' to enable it. +that it may lead to OOM). You can set `lookup.cache = memory` to enable it. 3. Support V2 write for Spark which reducing serialization overhead and improving write performance. Currently, only fixed bucket -and append-only (bucket = -1) table are supported. You can set 'write.use-v2-write' = 'true' to enable it. +and append-only (bucket = -1) table are supported. You can set `write.use-v2-write = true` to enable it. -4. Fix the possible data error problem of Spark bucket join after rescale bucket. +4. Fix the possible data error problem of Spark bucket join after rescaling bucket. 5. Fix that Hive cannot read/write data of timestamp with local timezone type correctly. @@ -167,13 +167,13 @@ Our users have fed back many OOM problems, and we have made some optimizations t 1. Optimize the deserialization of the data file statistics to reduce memory usage. 2. For Flink batch jobs, the splits scan are handled in initialization phase in job manager. If the amount of data is large, the job -initialization will take a long time and even failed with OOM. To avoid this, you scan set 'scan.dedicated-split-generation' = 'true' -to let the splits be scanned in task manager after the job is started. +initialization will take a long time and even fail with OOM. To avoid this, you scan set `scan.dedicated-split-generation = true` to +let the splits be scanned in task manager after the job is started. 3. If you write too many partitions at a time to a Postpone bucket table, it is easy to cause OOM. We have optimized the memery usage to solve it. -4. When too many partitions data are expired in a single commit, it is possibly to produce OOM. You can set 'partition.expiration-batch-size' +4. When too many partitions data are expired in a single commit, it is possibly to produce OOM. You can set `partition.expiration-batch-size` to specify the limit of maximum partitions can be expired in a single commit to avoid this problem. ## Others @@ -203,12 +203,12 @@ CREATE TABLE my_table ( CALL sys.alter_column_default_value('default.my_table', 'b', '2'); ``` -2. Introduce a new time travel option: 'scan.creation-time-millis' to specify a timestamp. If a snapshot is available near the time, starting -from the snapshot. Otherwise, reading files created later than the specified time. This option combines the scan.snapshot-id/scan.timestamp-millis -and scan.file-creation-time-millis. +2. Introduce a new time travel option: `scan.creation-time-millis` to specify a timestamp. If a snapshot is available near the time, starting +from the snapshot. Otherwise, reading files created later than the specified time. This option combines the `scan.snapshot-id/scan.timestamp-millis` +and `scan.file-creation-time-millis`. -3. Support custom partition expiration strategy: You can provide a custom `PartitionExpireStrategyFactory` and set the table option 'partition.expiration-strategy' = 'custom' +3. Support custom partition expiration strategy: You can provide a custom `PartitionExpireStrategyFactory` and set the table option `partition.expiration-strategy = custom` to activate your partition expiration method. -4. Support custom Flink commit listeners: You can provide multiple custom `CommitListenerFactory` and set the table option 'commit.custom-listeners' = 'listener1,listener2,...' +4. Support custom Flink commit listeners: You can provide multiple custom `CommitListenerFactory` and set the table option `commit.custom-listeners = listener1,listener2,...` to activate your commit actions at commit phase in a Flink write job.
