(paimon-website) 01/02: [temp]

yuzelin Fri, 18 Jul 2025 04:27:25 -0700

This is an automated email from the ASF dual-hosted git repository.

yuzelin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/paimon-website.git


commit 815f6cd2ad6d428629416e69a441e778d85ee801
Author: yuzelin <[email protected]>
AuthorDate: Wed Jul 16 22:42:07 2025 +0800

    [temp]
---
 community/docs/releases/release-1.2.md | 214 +++++++++++++++++++++++++++++++++
 1 file changed, 214 insertions(+)

diff --git a/community/docs/releases/release-1.2.md 
b/community/docs/releases/release-1.2.md
new file mode 100644
index 000000000..70083735e
--- /dev/null
+++ b/community/docs/releases/release-1.2.md
@@ -0,0 +1,214 @@
+---
+title: "Release 1.2"
+type: release
+version: 1.2.0
+weight: 94
+---
+
+# Apache Paimon 1.2 Available
+
+JUL 16, 2025 - Zelin Yu ([email protected])
+
+The Apache Paimon PMC officially announces the release of Apache Paimon 1.2.0. 
This version has been developed for 
+nearly 3 moths, bringing together the wisdom of more than 50 developers from 
the global open source community, and 
+has completed more than 260 commits. We sincerely thank all the developers who 
contributed!
+
+## Version Overview
+
+Notable changes in this version are:
+1. Polished Iceberg compatibility, more silky integration with Iceberg.
+2. Introduce Function to enhance data processing and query capabilities.
+3. REST Catalog capability further enhanced.
+4. Postpone (adaptive bucket) Table capability enhancement and bug fix.
+5. Support for migrating Hudi tables to Paimon tables.
+6. Continue to enhance the integration with Flink/Spark/Hive, add new features 
and fix bugs.
+7. Make multiple optimizations for memory usage to avoid potential OOM issues.
+
+## Iceberg Compatibility
+
+In this version, Iceberg compatibility adds the following capabilities:
+
+1. Deletion vector compatibility: The file format of Iceberg's deletion vector 
is different from Paimon's, so we 
+introduce a new deletion vector file format. You can now set the 
'delete-vectors.bitmap64' = 'true' to produce 
+the Iceberg-compatible delete vector files.
+
+2. Flexible storage location setting: When 'metadata.iceberg.storage' = 
'table-location' is set, the Iceberg metadata 
+is stored in the table directory, but won't be registered in Hive/AWS Glue. 
Therefore, a new parameter 
+'metadata.iceberg.storage-location' is introduced. When the parameter is set 
to 'table-location', the Iceberg metadata 
+is stored in the table directory and also registered in Hive/AWS Glue. In this 
way, you can deployment the data flexibly.
+
+3. Tag support: Now, when a Paimon Tag is created or deleted, the 
corresponding Iceberg metadata is also changed, and you
+can access the Tag through Iceberg.
+
+## Function
+
+We introduce the Function interface for better data and query processing. 
Currently, supporting three types:
+
+1. File Function: Provides the Function definition through a File.
+2. Lambda Function: Defines the Function by Java lambda expression.
+3. SQL Function： Defines the Function by Java SQL.
+
+Examples of Function management by Spark are as follows:
+
+Create:
+```
+CALL sys.create_function(
+       `function` => 'my_db.area_func',
+       `inputParams` => '[{"id": 0, "name":"length", "type":"INT"}, {"id": 1, 
"name":"width", "type":"INT"}]',
+  `returnParams` => '[{"id": 0, "name":"area", "type":"BIGINT"}]',
+       `deterministic` => true,
+       `comment` => 'comment',
+       `options` => 'k1=v1,k2=v2'
+);
+```
+
+Modify the definition:
+```
+CALL sys.alter_function(
+  `function` => 'my_db.area_func',
+  `change` => '{"action" : "addDefinition", "name" : "spark", "definition" : 
{"type" : "lambda", "definition" : "(Integer length, Integer width) -> { return 
(long) length * width; }", "language": "JAVA" } }'
+);
+```
+
+Using in query:
+```
+SELECT paimon.my_db.area_func(1, 2);
+```
+
+Delete:
+```
+CALL sys.drop_function(`function` => 'my_db.area_func');
+```
+
+Currently, the wide-used computing engines don't support functions well. When 
they provide better Function interfaces, 
+we can provide a more convenient user interface with the computing engines.
+
+## REST Catalog
+
+This release continues to enhance the REST Catalog, providing the following 
optimizations and bug fixes:
+
+1. Provide row-level and column-level data authentication interfaces.
+2. Add the following data access interfaces: list tables, list views, list 
functions.
+3. Support list object with pattern.
+4. Provide the snapshot access interface.
+5. Fix the problem that the table created under REST Catalog cannot read the 
fallback branch.
+
+## Postpone Bucket
+
+This version continues to improve the Postpone bucket table capabilities, such 
as:
+
+1. Support deletion vector.
+2. Support 'partition.sink-strategy' option which improves write performance.
+3. Paimon CDC supports Postpone bucket table.
+4. Fix the problem that lookup join with a Postpone bucket table as dimension 
table produces wrong result.
+5. Fix possible data error problem of Postpone bucket table write job when the 
source and sink parallelisms are not the same.
+6. Fix the problem that the Postpone bucket table cannot be streaming read 
when 'changelog-producer' = 'none'.
+7. Fix possible data lost problem if the rescale and compaction jobs of one 
Postpone bucket table are submitted at the same time. 
+The 'commit.strict-mode.last-safe-snapshot' option is provided to solve it. 
The job will check the security of commit from the 
+snapshot specified by the option. If the job is newly started, you can 
directly set it to -1.
+
+## Hudi Migration
+
+We provide a Huid table migration tool to support Hudi table easily integrated 
with the Paimon ecosystem. Currently, only Hudi 
+tables registered in HMS are supported. The usage is as follows (through the 
Flink Jar job):
+
+```
+<FLINK_HOME>/flink run ./paimon-flink-action-1.2.0.jar \
+  clone \
+  --database default \
+  --table hudi_table \
+  --catalog_conf metastore=hive \
+  --catalog_conf uri=thrift://localhost:9088 \
+  --target_database test \
+  --target_table test_table \
+  --target_catalog_conf warehouse=my_warehouse \
+  --parallelism 10 \
+  --where <partition_filter_spec>
+```
+
+## Compute Engine Integration Enhancements
+
+This version provides new features and bug fixes of the Flink/Spark/Hive 
connector:
+
+1. Flink lookup join optimization: Previously, all data of the Paimon 
dimension table needs to be cached in taskmanager. 
+[FLIP-462](https://cwiki.apache.org/confluence/display/FLINK/FLIP-462+Support+Custom+Data+Distribution+for+Input+Stream+of+Lookup+Join)
+allows to customize the data shuffle mode of the lookup join operator. Paimon 
implements this optimization, allowing each subtask 
+to load part of the dimension table data (instead of full data). In this way, 
the loading of the dimension table will take less time 
+when starting the job and the cache will use less memory. 
+
+This optimization requires: i) Using Flink 2.0; ii) The Paimon table is a 
Fixed bucket table and the join keys contains all bucket keys
+(if not configured, the bucket keys is the same as primary keys). You should 
use lookup join hints to enable this optimization:
+
+```
+-- customers is the Paimon dimension table
+SELECT /*+ LOOKUP('table'='c', 'shuffle'='true') */
+o.order_id, o.total, c.country, c.zip
+FROM orders AS o
+JOIN customers
+FOR SYSTEM_TIME AS OF o.proc_time AS c
+ON o.customer_id = c.id;
+```
+
+2. Paimon dimension table supports to be loaded in-memory cache: Previously, 
Paimon dimension table uses RocksDB as the cache, 
+but its performance is not very good. Therefore, this version introduces 
purely in-memory cache for dimension table data (note 
+that it may lead to OOM). You can set 'lookup.cache' = 'memory' to enable it.
+
+3. Support V2 write for Spark which reducing serialization overhead and 
improving write performance. Currently, only fixed bucket 
+and append-only (bucket = -1) table are supported. You can set 
'write.use-v2-write' = 'true' to enable it.
+
+4. Fix the possible data error problem of Spark bucket join after rescale 
bucket.
+
+5. Fix that Hive cannot read/write data of timestamp with local timezone type 
correctly.
+
+## Memory Usage Optimization
+
+Our users have fed back many OOM problems, and we have made some optimizations 
to solve them:
+
+1. Optimize the deserialization of the data file statistics to reduce memory 
usage.
+
+2. For Flink batch jobs, the splits scan are handled in initialization phase 
in job manager. If the amount of data is large, the job 
+initialization will take a long time and even failed with OOM. To avoid this, 
you scan set 'scan.dedicated-split-generation' = 'true'  
+to let the splits be scanned in task manager after the job is started.
+
+3. If you write too many partitions at a time to a Postpone bucket table, it 
is easy to cause OOM. We have optimized the memery usage
+to solve it. 
+
+4. When too many partitions data are expired in a single commit, it is 
possibly to produce OOM. You can set 'partition.expiration-batch-size'
+to specify the limit of maximum partitions can be expired in a single commit 
to avoid this problem.
+
+## Others
+
+1. Support default value: The old version of the default value implementation 
has some defects, and we reimplemented a new version of it.
+Spark and Flink usages are as follows:
+
+Spark:
+```
+-- Define
+CREATE TABLE my_table (
+    a INT,
+    b INT DEFAULT 2
+);
+
+-- Modify
+ALTER TABLE my_table ALTER COLUMN b SET DEFAULT 3;
+```
+
+Flink: Flink SQL does not support default values now, so you should create the 
table first, then set the default values through Procedure.
+```
+CREATE TABLE my_table (
+    a INT,
+    b INT
+);
+
+CALL sys.alter_column_default_value('default.my_table', 'b', '2');
+```
+
+2. Introduce a new time travel option: 'scan.creation-time-millis' to specify 
a timestamp. If a snapshot is available near the time, starting 
+from the snapshot. Otherwise, reading files created later than the specified 
time. This option combines the scan.snapshot-id/scan.timestamp-millis 
+and scan.file-creation-time-millis.
+
+3. Support custom partition expiration strategy: You can provide a custom 
`PartitionExpireStrategyFactory` and set the table option 
'partition.expiration-strategy' = 'custom'
+to activate your partition expiration method.
+
+4. Support custom Flink commit listeners: You can provide multiple custom 
`CommitListenerFactory` and set the table option 'commit.custom-listeners' = 
'listener1,listener2,...' 
+to activate your commit actions at commit phase in a Flink write job.

(paimon-website) 01/02: [temp]

Reply via email to