JingsongLi commented on code in PR #1158: URL: https://github.com/apache/incubator-paimon/pull/1158#discussion_r1194535059
########## docs/content/maintenance/small-files.md: ########## @@ -0,0 +1,316 @@ +--- +title: "A Deep-Dive into Paimon's Small Files" +weight: 2 +type: docs +aliases: +- /maintenance/read-performance.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +# A Deep-Dive into Paimon's Small Files + +Paimon table relies heavily on file management mechanisms such as +checkpoint and snapshot expiration, but users are often left wondering +about the origin of the numerous small files that result from these +operations. In response, we have created a dedicated page to explain +the sources of these files and offer tips on how to manage them effectively. +Using concrete examples, we will delve into the details of how different +operations such as commit and compact can create small files and provide +insights into their updates. + + +## Prerequisite + +Before go deeper in to this page, make sure you have read through the +[Basic Concepts]({{< ref "concepts/basic-concepts" >}}), +[File Layouts]({{< ref "concepts/file-layouts" >}}) and +how to use Paimon in [Flink]({{< ref "engines/flink" >}}). + +{{< img src="/img/file-layout.png">}} + + +## Create Paimon Catalog +Start Flink SQL client via `./sql-client.sh` and execute the following +statements one by one to create a Paimon catalog. +```sql +CREATE CATALOG paimon WITH ( +'type' = 'paimon', +'warehouse' = 'file:///tmp/paimon' +); + +USE CATALOG paimon; +``` + +This will only create a directory at given path `file:///tmp/paimon`. + +## Create Paimon Table + +Execute the following create table statement will create a Paimon table with 3 fields: + +```sql +CREATE TABLE T ( + id BIGINT, + a INT, + b STRING, + dt STRING COMMENT 'timestamp string in format yyyyMMdd', + PRIMARY KEY(id, dt) NOT ENFORCED +) PARTITIONED BY (dt); +``` + +This will create Paimon table `T` under the path `/tmp/paimon/default.db/T`, +with its schema stored in `/tmp/paimon/default.db/T/schema/schema-0` + + +## Insert Records Into Paimon Table Review Comment: Insert Records Into Table ########## docs/content/maintenance/small-files.md: ########## @@ -0,0 +1,316 @@ +--- +title: "A Deep-Dive into Paimon's Small Files" +weight: 2 +type: docs +aliases: +- /maintenance/read-performance.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +# A Deep-Dive into Paimon's Small Files + +Paimon table relies heavily on file management mechanisms such as +checkpoint and snapshot expiration, but users are often left wondering +about the origin of the numerous small files that result from these +operations. In response, we have created a dedicated page to explain +the sources of these files and offer tips on how to manage them effectively. +Using concrete examples, we will delve into the details of how different +operations such as commit and compact can create small files and provide +insights into their updates. + + +## Prerequisite + +Before go deeper in to this page, make sure you have read through the +[Basic Concepts]({{< ref "concepts/basic-concepts" >}}), +[File Layouts]({{< ref "concepts/file-layouts" >}}) and +how to use Paimon in [Flink]({{< ref "engines/flink" >}}). + +{{< img src="/img/file-layout.png">}} + + +## Create Paimon Catalog +Start Flink SQL client via `./sql-client.sh` and execute the following +statements one by one to create a Paimon catalog. +```sql +CREATE CATALOG paimon WITH ( +'type' = 'paimon', +'warehouse' = 'file:///tmp/paimon' +); + +USE CATALOG paimon; +``` + +This will only create a directory at given path `file:///tmp/paimon`. + +## Create Paimon Table Review Comment: Create Table ########## docs/content/maintenance/small-files.md: ########## @@ -0,0 +1,316 @@ +--- +title: "A Deep-Dive into Paimon's Small Files" +weight: 2 +type: docs +aliases: +- /maintenance/read-performance.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +# A Deep-Dive into Paimon's Small Files + +Paimon table relies heavily on file management mechanisms such as +checkpoint and snapshot expiration, but users are often left wondering +about the origin of the numerous small files that result from these +operations. In response, we have created a dedicated page to explain +the sources of these files and offer tips on how to manage them effectively. +Using concrete examples, we will delve into the details of how different +operations such as commit and compact can create small files and provide +insights into their updates. + + +## Prerequisite + +Before go deeper in to this page, make sure you have read through the +[Basic Concepts]({{< ref "concepts/basic-concepts" >}}), +[File Layouts]({{< ref "concepts/file-layouts" >}}) and +how to use Paimon in [Flink]({{< ref "engines/flink" >}}). + +{{< img src="/img/file-layout.png">}} + + +## Create Paimon Catalog +Start Flink SQL client via `./sql-client.sh` and execute the following +statements one by one to create a Paimon catalog. +```sql +CREATE CATALOG paimon WITH ( +'type' = 'paimon', +'warehouse' = 'file:///tmp/paimon' +); + +USE CATALOG paimon; +``` + +This will only create a directory at given path `file:///tmp/paimon`. + +## Create Paimon Table + +Execute the following create table statement will create a Paimon table with 3 fields: + +```sql +CREATE TABLE T ( + id BIGINT, + a INT, + b STRING, + dt STRING COMMENT 'timestamp string in format yyyyMMdd', + PRIMARY KEY(id, dt) NOT ENFORCED +) PARTITIONED BY (dt); +``` + +This will create Paimon table `T` under the path `/tmp/paimon/default.db/T`, +with its schema stored in `/tmp/paimon/default.db/T/schema/schema-0` + + +## Insert Records Into Paimon Table + +Run the following insert statement in Flink SQL: + +```sql +INSERT INTO T VALUES (1, 10001, 'varchar00001', '20230501'); +``` + +After the Flink job finishes, the records are written into Paimon table, which +is done by a successful `commit`. The records are visible to user +as can be verified by `SELECT * FROM T` which return a single row. +The commit creates a snapshot under path `/tmp/paimon/default.db/T/snapshot/snapshot-1`. +The resulting file layout as of snapshot-1 is as follows: + +{{< img src="/img/small-file-0.png">}} + +The content of snapshot-1 contains metadata of the snapshot, such as manifest list and schema id: +```json +{ + "version" : 3, + "id" : 1, + "schemaId" : 0, + "baseManifestList" : "manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-0", + "deltaManifestList" : "manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-1", + "changelogManifestList" : null, + "commitUser" : "7d758485-981d-4b1a-a0c6-d34c3eb254bf", + "commitIdentifier" : 9223372036854775807, + "commitKind" : "APPEND", + "timeMillis" : 1684155393354, + "logOffsets" : { }, + "totalRecordCount" : 1, + "deltaRecordCount" : 1, + "changelogRecordCount" : 0, + "watermark" : -9223372036854775808 +} +``` + +Remind that a manifest list contains all changes of the snapshot, `baseManifestList` is the base +file upon which the changes in `deltaManifestList` is applied. +The first commit will result in 1 manifest file, and 2 manifest lists are +created (the file names might differ from those in your experiment): + +```bash +./T/manifest: +manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-1 +manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-0 +manifest-2b833ea4-d7dc-4de0-ae0d-ad76eced75cc-0 +``` +`manifest-2b833ea4-d7dc-4de0-ae0d-ad76eced75cc-0` is the manifest +file (manifest-1-0 in the above graph), which stores the information about the data files in the snapshot. + +`manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-0` is the +baseManifestList (manifest-list-1-base in the above graph), which is effectively empty. + +`manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-1` is the +deltaManifestList (manifest-list-1-delta in the above graph), which +contains a list of manifest entries that perform operations on data +files, which, in this case, is `manifest-B-0`. + + +Now let's insert a batch of records across different partitions and +see what happens. In Flink SQL, execute the following statement: + +```sql +INSERT INTO T VALUES +(2, 10002, 'varchar00002', '20230502'), +(3, 10003, 'varchar00003', '20230503'), +(4, 10004, 'varchar00004', '20230504'), +(5, 10005, 'varchar00005', '20230505'), +(6, 10006, 'varchar00006', '20230506'), +(7, 10007, 'varchar00007', '20230507'), +(8, 10008, 'varchar00008', '20230508'), +(9, 10009, 'varchar00009', '20230509'), +(10, 10010, 'varchar00010', '20230510'); +``` + +The second `commit` takes place and executing `SELECT * FROM T` will return +10 rows. A new snapshot, namely `snapshot-2`, is created and gives us the +following physical file layout: +```bash + % ls -atR . +T . .. + +./T: +dt=20230501 +dt=20230502 +dt=20230503 +dt=20230504 +dt=20230505 +dt=20230506 +dt=20230507 +dt=20230508 +dt=20230509 +dt=20230510 +snapshot +schema +manifest + +./T/snapshot: +LATEST +snapshot-2 +EARLIEST +snapshot-1 + +./T/manifest: +manifest-96739ac2-5e79-4978-a3bc-86c25f1a303f-1 # delta manifest list for snapshot-2 +manifest-96739ac2-5e79-4978-a3bc-86c25f1a303f-0 # base manifest list for snapshot-2 +manifest-f1267033-e246-4470-a54c-5c27fdbdd074-0 # manifest file for snapshot-2 + +manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-1 # delta manifest list for snapshot-1 +manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-0 # base manifest list for snapshot-1 +manifest-2b833ea4-d7dc-4de0-ae0d-ad76eced75cc-0 # manifest file for snapshot-1 + +./T/dt=20230501/bucket-0: +data-b75b7381-7c8b-430f-b7e5-a204cb65843c-0.orc + +... +# each partition has the data written to bucket-0 +... + +./T/schema: +schema-0 +``` +The new file layout as of snapshot-2 looks like +{{< img src="/img/small-file-1.png">}} + +## Delete Records From Paimon Table Review Comment: Delete Records From Table ########## docs/content/maintenance/small-files.md: ########## @@ -0,0 +1,316 @@ +--- +title: "A Deep-Dive into Paimon's Small Files" +weight: 2 +type: docs +aliases: +- /maintenance/read-performance.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +# A Deep-Dive into Paimon's Small Files + +Paimon table relies heavily on file management mechanisms such as +checkpoint and snapshot expiration, but users are often left wondering +about the origin of the numerous small files that result from these +operations. In response, we have created a dedicated page to explain +the sources of these files and offer tips on how to manage them effectively. +Using concrete examples, we will delve into the details of how different +operations such as commit and compact can create small files and provide +insights into their updates. + + +## Prerequisite + +Before go deeper in to this page, make sure you have read through the +[Basic Concepts]({{< ref "concepts/basic-concepts" >}}), +[File Layouts]({{< ref "concepts/file-layouts" >}}) and +how to use Paimon in [Flink]({{< ref "engines/flink" >}}). + +{{< img src="/img/file-layout.png">}} + + +## Create Paimon Catalog Review Comment: Create Catalog ########## docs/content/maintenance/small-files.md: ########## @@ -0,0 +1,316 @@ +--- +title: "A Deep-Dive into Paimon's Small Files" +weight: 2 +type: docs +aliases: +- /maintenance/read-performance.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +# A Deep-Dive into Paimon's Small Files + +Paimon table relies heavily on file management mechanisms such as +checkpoint and snapshot expiration, but users are often left wondering +about the origin of the numerous small files that result from these +operations. In response, we have created a dedicated page to explain +the sources of these files and offer tips on how to manage them effectively. +Using concrete examples, we will delve into the details of how different +operations such as commit and compact can create small files and provide +insights into their updates. + + +## Prerequisite + +Before go deeper in to this page, make sure you have read through the +[Basic Concepts]({{< ref "concepts/basic-concepts" >}}), +[File Layouts]({{< ref "concepts/file-layouts" >}}) and +how to use Paimon in [Flink]({{< ref "engines/flink" >}}). + +{{< img src="/img/file-layout.png">}} + + +## Create Paimon Catalog +Start Flink SQL client via `./sql-client.sh` and execute the following +statements one by one to create a Paimon catalog. +```sql +CREATE CATALOG paimon WITH ( +'type' = 'paimon', +'warehouse' = 'file:///tmp/paimon' +); + +USE CATALOG paimon; +``` + +This will only create a directory at given path `file:///tmp/paimon`. + +## Create Paimon Table + +Execute the following create table statement will create a Paimon table with 3 fields: + +```sql +CREATE TABLE T ( + id BIGINT, + a INT, + b STRING, + dt STRING COMMENT 'timestamp string in format yyyyMMdd', + PRIMARY KEY(id, dt) NOT ENFORCED +) PARTITIONED BY (dt); +``` + +This will create Paimon table `T` under the path `/tmp/paimon/default.db/T`, +with its schema stored in `/tmp/paimon/default.db/T/schema/schema-0` + + +## Insert Records Into Paimon Table + +Run the following insert statement in Flink SQL: + +```sql +INSERT INTO T VALUES (1, 10001, 'varchar00001', '20230501'); +``` + +After the Flink job finishes, the records are written into Paimon table, which +is done by a successful `commit`. The records are visible to user +as can be verified by `SELECT * FROM T` which return a single row. +The commit creates a snapshot under path `/tmp/paimon/default.db/T/snapshot/snapshot-1`. +The resulting file layout as of snapshot-1 is as follows: + +{{< img src="/img/small-file-0.png">}} + +The content of snapshot-1 contains metadata of the snapshot, such as manifest list and schema id: +```json +{ + "version" : 3, + "id" : 1, + "schemaId" : 0, + "baseManifestList" : "manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-0", Review Comment: I remember manifest list file is `manifest-list-****`? ########## docs/content/maintenance/small-files.md: ########## @@ -0,0 +1,316 @@ +--- +title: "A Deep-Dive into Paimon's Small Files" Review Comment: Just `Small Files`? ########## docs/content/maintenance/small-files.md: ########## @@ -0,0 +1,316 @@ +--- +title: "A Deep-Dive into Paimon's Small Files" +weight: 2 +type: docs +aliases: +- /maintenance/read-performance.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +# A Deep-Dive into Paimon's Small Files + +Paimon table relies heavily on file management mechanisms such as +checkpoint and snapshot expiration, but users are often left wondering +about the origin of the numerous small files that result from these +operations. In response, we have created a dedicated page to explain +the sources of these files and offer tips on how to manage them effectively. +Using concrete examples, we will delve into the details of how different +operations such as commit and compact can create small files and provide +insights into their updates. + + +## Prerequisite + +Before go deeper in to this page, make sure you have read through the +[Basic Concepts]({{< ref "concepts/basic-concepts" >}}), +[File Layouts]({{< ref "concepts/file-layouts" >}}) and +how to use Paimon in [Flink]({{< ref "engines/flink" >}}). + +{{< img src="/img/file-layout.png">}} + + +## Create Paimon Catalog +Start Flink SQL client via `./sql-client.sh` and execute the following +statements one by one to create a Paimon catalog. +```sql +CREATE CATALOG paimon WITH ( +'type' = 'paimon', +'warehouse' = 'file:///tmp/paimon' +); + +USE CATALOG paimon; +``` + +This will only create a directory at given path `file:///tmp/paimon`. + +## Create Paimon Table + +Execute the following create table statement will create a Paimon table with 3 fields: + +```sql +CREATE TABLE T ( + id BIGINT, + a INT, + b STRING, + dt STRING COMMENT 'timestamp string in format yyyyMMdd', + PRIMARY KEY(id, dt) NOT ENFORCED +) PARTITIONED BY (dt); +``` + +This will create Paimon table `T` under the path `/tmp/paimon/default.db/T`, +with its schema stored in `/tmp/paimon/default.db/T/schema/schema-0` + + +## Insert Records Into Paimon Table + +Run the following insert statement in Flink SQL: + +```sql +INSERT INTO T VALUES (1, 10001, 'varchar00001', '20230501'); +``` + +After the Flink job finishes, the records are written into Paimon table, which +is done by a successful `commit`. The records are visible to user +as can be verified by `SELECT * FROM T` which return a single row. +The commit creates a snapshot under path `/tmp/paimon/default.db/T/snapshot/snapshot-1`. +The resulting file layout as of snapshot-1 is as follows: + +{{< img src="/img/small-file-0.png">}} + +The content of snapshot-1 contains metadata of the snapshot, such as manifest list and schema id: +```json +{ + "version" : 3, + "id" : 1, + "schemaId" : 0, + "baseManifestList" : "manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-0", + "deltaManifestList" : "manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-1", + "changelogManifestList" : null, + "commitUser" : "7d758485-981d-4b1a-a0c6-d34c3eb254bf", + "commitIdentifier" : 9223372036854775807, + "commitKind" : "APPEND", + "timeMillis" : 1684155393354, + "logOffsets" : { }, + "totalRecordCount" : 1, + "deltaRecordCount" : 1, + "changelogRecordCount" : 0, + "watermark" : -9223372036854775808 +} +``` + +Remind that a manifest list contains all changes of the snapshot, `baseManifestList` is the base +file upon which the changes in `deltaManifestList` is applied. +The first commit will result in 1 manifest file, and 2 manifest lists are +created (the file names might differ from those in your experiment): + +```bash +./T/manifest: +manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-1 +manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-0 +manifest-2b833ea4-d7dc-4de0-ae0d-ad76eced75cc-0 +``` +`manifest-2b833ea4-d7dc-4de0-ae0d-ad76eced75cc-0` is the manifest +file (manifest-1-0 in the above graph), which stores the information about the data files in the snapshot. + +`manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-0` is the +baseManifestList (manifest-list-1-base in the above graph), which is effectively empty. + +`manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-1` is the +deltaManifestList (manifest-list-1-delta in the above graph), which +contains a list of manifest entries that perform operations on data +files, which, in this case, is `manifest-B-0`. + + +Now let's insert a batch of records across different partitions and +see what happens. In Flink SQL, execute the following statement: + +```sql +INSERT INTO T VALUES +(2, 10002, 'varchar00002', '20230502'), +(3, 10003, 'varchar00003', '20230503'), +(4, 10004, 'varchar00004', '20230504'), +(5, 10005, 'varchar00005', '20230505'), +(6, 10006, 'varchar00006', '20230506'), +(7, 10007, 'varchar00007', '20230507'), +(8, 10008, 'varchar00008', '20230508'), +(9, 10009, 'varchar00009', '20230509'), +(10, 10010, 'varchar00010', '20230510'); +``` + +The second `commit` takes place and executing `SELECT * FROM T` will return +10 rows. A new snapshot, namely `snapshot-2`, is created and gives us the +following physical file layout: +```bash + % ls -atR . +T . .. + +./T: +dt=20230501 +dt=20230502 +dt=20230503 +dt=20230504 +dt=20230505 +dt=20230506 +dt=20230507 +dt=20230508 +dt=20230509 +dt=20230510 +snapshot +schema +manifest + +./T/snapshot: +LATEST +snapshot-2 +EARLIEST +snapshot-1 + +./T/manifest: +manifest-96739ac2-5e79-4978-a3bc-86c25f1a303f-1 # delta manifest list for snapshot-2 +manifest-96739ac2-5e79-4978-a3bc-86c25f1a303f-0 # base manifest list for snapshot-2 +manifest-f1267033-e246-4470-a54c-5c27fdbdd074-0 # manifest file for snapshot-2 + +manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-1 # delta manifest list for snapshot-1 +manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-0 # base manifest list for snapshot-1 +manifest-2b833ea4-d7dc-4de0-ae0d-ad76eced75cc-0 # manifest file for snapshot-1 + +./T/dt=20230501/bucket-0: +data-b75b7381-7c8b-430f-b7e5-a204cb65843c-0.orc + +... +# each partition has the data written to bucket-0 +... + +./T/schema: +schema-0 +``` +The new file layout as of snapshot-2 looks like +{{< img src="/img/small-file-1.png">}} + +## Delete Records From Paimon Table + +Now let's delete records that meet the condition `dt>=20230503`. +In Flink SQL, execute the following statement: + +```sql +DELETE FROM T WHERE dt >= '20230503'; +``` +The third `commit` takes place and it gives use `snapshot-3`. Now, listing the files +under the table and your will find out no partition is dropped. Instead, a new data +file is created for partition `20230503` to `20230510`: + +```bash +./T/dt=20230510/bucket-0: +data-b93f468c-b56f-4a93-adc4-b250b3aa3462-0.orc # newer data file created by the delete statement +data-0fcacc70-a0cb-4976-8c88-73e92769a762-0.orc # older data file created by the insert statement +``` + +This make sense since we insert a record in the second commit (represented by +`+I[10, 10010, 'varchar00010', '20230510']`) and then delete +the record in the third commit. Executing `SELECT * FROM T` will return 2 rows, namely: +``` ++I[1, 10001, 'varchar00001', '20230501'] ++I[2, 10002, 'varchar00002', '20230502'] +``` + +The new file layout as of snapshot-3 looks like +{{< img src="/img/small-file-2.png">}} + +Note that `manifest-3-0` contains 8 manifest entries of `ADD` operation type, +corresponding to 8 newly written data files. + +## Alter Paimon Table Review Comment: Alter Table ########## docs/content/maintenance/small-files.md: ########## @@ -0,0 +1,316 @@ +--- +title: "A Deep-Dive into Paimon's Small Files" +weight: 2 +type: docs +aliases: +- /maintenance/read-performance.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +# A Deep-Dive into Paimon's Small Files + +Paimon table relies heavily on file management mechanisms such as +checkpoint and snapshot expiration, but users are often left wondering +about the origin of the numerous small files that result from these +operations. In response, we have created a dedicated page to explain +the sources of these files and offer tips on how to manage them effectively. +Using concrete examples, we will delve into the details of how different +operations such as commit and compact can create small files and provide +insights into their updates. + + +## Prerequisite + +Before go deeper in to this page, make sure you have read through the +[Basic Concepts]({{< ref "concepts/basic-concepts" >}}), +[File Layouts]({{< ref "concepts/file-layouts" >}}) and +how to use Paimon in [Flink]({{< ref "engines/flink" >}}). + +{{< img src="/img/file-layout.png">}} + + +## Create Paimon Catalog +Start Flink SQL client via `./sql-client.sh` and execute the following +statements one by one to create a Paimon catalog. +```sql +CREATE CATALOG paimon WITH ( +'type' = 'paimon', +'warehouse' = 'file:///tmp/paimon' +); + +USE CATALOG paimon; +``` + +This will only create a directory at given path `file:///tmp/paimon`. + +## Create Paimon Table + +Execute the following create table statement will create a Paimon table with 3 fields: + +```sql +CREATE TABLE T ( + id BIGINT, + a INT, + b STRING, + dt STRING COMMENT 'timestamp string in format yyyyMMdd', + PRIMARY KEY(id, dt) NOT ENFORCED +) PARTITIONED BY (dt); +``` + +This will create Paimon table `T` under the path `/tmp/paimon/default.db/T`, +with its schema stored in `/tmp/paimon/default.db/T/schema/schema-0` + + +## Insert Records Into Paimon Table + +Run the following insert statement in Flink SQL: + +```sql +INSERT INTO T VALUES (1, 10001, 'varchar00001', '20230501'); +``` + +After the Flink job finishes, the records are written into Paimon table, which +is done by a successful `commit`. The records are visible to user +as can be verified by `SELECT * FROM T` which return a single row. +The commit creates a snapshot under path `/tmp/paimon/default.db/T/snapshot/snapshot-1`. +The resulting file layout as of snapshot-1 is as follows: + +{{< img src="/img/small-file-0.png">}} + +The content of snapshot-1 contains metadata of the snapshot, such as manifest list and schema id: +```json +{ + "version" : 3, + "id" : 1, + "schemaId" : 0, + "baseManifestList" : "manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-0", + "deltaManifestList" : "manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-1", + "changelogManifestList" : null, + "commitUser" : "7d758485-981d-4b1a-a0c6-d34c3eb254bf", + "commitIdentifier" : 9223372036854775807, + "commitKind" : "APPEND", + "timeMillis" : 1684155393354, + "logOffsets" : { }, + "totalRecordCount" : 1, + "deltaRecordCount" : 1, + "changelogRecordCount" : 0, + "watermark" : -9223372036854775808 +} +``` + +Remind that a manifest list contains all changes of the snapshot, `baseManifestList` is the base +file upon which the changes in `deltaManifestList` is applied. +The first commit will result in 1 manifest file, and 2 manifest lists are +created (the file names might differ from those in your experiment): + +```bash +./T/manifest: +manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-1 +manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-0 +manifest-2b833ea4-d7dc-4de0-ae0d-ad76eced75cc-0 +``` +`manifest-2b833ea4-d7dc-4de0-ae0d-ad76eced75cc-0` is the manifest +file (manifest-1-0 in the above graph), which stores the information about the data files in the snapshot. + +`manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-0` is the +baseManifestList (manifest-list-1-base in the above graph), which is effectively empty. + +`manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-1` is the +deltaManifestList (manifest-list-1-delta in the above graph), which +contains a list of manifest entries that perform operations on data +files, which, in this case, is `manifest-B-0`. + + +Now let's insert a batch of records across different partitions and +see what happens. In Flink SQL, execute the following statement: + +```sql +INSERT INTO T VALUES +(2, 10002, 'varchar00002', '20230502'), +(3, 10003, 'varchar00003', '20230503'), +(4, 10004, 'varchar00004', '20230504'), +(5, 10005, 'varchar00005', '20230505'), +(6, 10006, 'varchar00006', '20230506'), +(7, 10007, 'varchar00007', '20230507'), +(8, 10008, 'varchar00008', '20230508'), +(9, 10009, 'varchar00009', '20230509'), +(10, 10010, 'varchar00010', '20230510'); +``` + +The second `commit` takes place and executing `SELECT * FROM T` will return +10 rows. A new snapshot, namely `snapshot-2`, is created and gives us the +following physical file layout: +```bash + % ls -atR . +T . .. Review Comment: What is this? `T . ..`, remove this? ########## docs/content/maintenance/small-files.md: ########## @@ -0,0 +1,316 @@ +--- +title: "A Deep-Dive into Paimon's Small Files" +weight: 2 +type: docs +aliases: +- /maintenance/read-performance.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +# A Deep-Dive into Paimon's Small Files + +Paimon table relies heavily on file management mechanisms such as +checkpoint and snapshot expiration, but users are often left wondering +about the origin of the numerous small files that result from these +operations. In response, we have created a dedicated page to explain +the sources of these files and offer tips on how to manage them effectively. +Using concrete examples, we will delve into the details of how different +operations such as commit and compact can create small files and provide +insights into their updates. + + +## Prerequisite + +Before go deeper in to this page, make sure you have read through the +[Basic Concepts]({{< ref "concepts/basic-concepts" >}}), +[File Layouts]({{< ref "concepts/file-layouts" >}}) and +how to use Paimon in [Flink]({{< ref "engines/flink" >}}). + +{{< img src="/img/file-layout.png">}} + + +## Create Paimon Catalog +Start Flink SQL client via `./sql-client.sh` and execute the following +statements one by one to create a Paimon catalog. +```sql +CREATE CATALOG paimon WITH ( +'type' = 'paimon', +'warehouse' = 'file:///tmp/paimon' +); + +USE CATALOG paimon; +``` + +This will only create a directory at given path `file:///tmp/paimon`. + +## Create Paimon Table + +Execute the following create table statement will create a Paimon table with 3 fields: + +```sql +CREATE TABLE T ( + id BIGINT, + a INT, + b STRING, + dt STRING COMMENT 'timestamp string in format yyyyMMdd', + PRIMARY KEY(id, dt) NOT ENFORCED +) PARTITIONED BY (dt); +``` + +This will create Paimon table `T` under the path `/tmp/paimon/default.db/T`, +with its schema stored in `/tmp/paimon/default.db/T/schema/schema-0` + + +## Insert Records Into Paimon Table + +Run the following insert statement in Flink SQL: + +```sql +INSERT INTO T VALUES (1, 10001, 'varchar00001', '20230501'); +``` + +After the Flink job finishes, the records are written into Paimon table, which +is done by a successful `commit`. The records are visible to user +as can be verified by `SELECT * FROM T` which return a single row. +The commit creates a snapshot under path `/tmp/paimon/default.db/T/snapshot/snapshot-1`. +The resulting file layout as of snapshot-1 is as follows: + +{{< img src="/img/small-file-0.png">}} + +The content of snapshot-1 contains metadata of the snapshot, such as manifest list and schema id: +```json +{ + "version" : 3, + "id" : 1, + "schemaId" : 0, + "baseManifestList" : "manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-0", + "deltaManifestList" : "manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-1", + "changelogManifestList" : null, + "commitUser" : "7d758485-981d-4b1a-a0c6-d34c3eb254bf", + "commitIdentifier" : 9223372036854775807, + "commitKind" : "APPEND", + "timeMillis" : 1684155393354, + "logOffsets" : { }, + "totalRecordCount" : 1, + "deltaRecordCount" : 1, + "changelogRecordCount" : 0, + "watermark" : -9223372036854775808 +} +``` + +Remind that a manifest list contains all changes of the snapshot, `baseManifestList` is the base +file upon which the changes in `deltaManifestList` is applied. +The first commit will result in 1 manifest file, and 2 manifest lists are +created (the file names might differ from those in your experiment): + +```bash +./T/manifest: +manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-1 +manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-0 +manifest-2b833ea4-d7dc-4de0-ae0d-ad76eced75cc-0 +``` +`manifest-2b833ea4-d7dc-4de0-ae0d-ad76eced75cc-0` is the manifest +file (manifest-1-0 in the above graph), which stores the information about the data files in the snapshot. + +`manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-0` is the +baseManifestList (manifest-list-1-base in the above graph), which is effectively empty. + +`manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-1` is the +deltaManifestList (manifest-list-1-delta in the above graph), which +contains a list of manifest entries that perform operations on data +files, which, in this case, is `manifest-B-0`. + + +Now let's insert a batch of records across different partitions and +see what happens. In Flink SQL, execute the following statement: + +```sql +INSERT INTO T VALUES +(2, 10002, 'varchar00002', '20230502'), +(3, 10003, 'varchar00003', '20230503'), +(4, 10004, 'varchar00004', '20230504'), +(5, 10005, 'varchar00005', '20230505'), +(6, 10006, 'varchar00006', '20230506'), +(7, 10007, 'varchar00007', '20230507'), +(8, 10008, 'varchar00008', '20230508'), +(9, 10009, 'varchar00009', '20230509'), +(10, 10010, 'varchar00010', '20230510'); +``` + +The second `commit` takes place and executing `SELECT * FROM T` will return +10 rows. A new snapshot, namely `snapshot-2`, is created and gives us the +following physical file layout: +```bash + % ls -atR . +T . .. + +./T: +dt=20230501 +dt=20230502 +dt=20230503 +dt=20230504 +dt=20230505 +dt=20230506 +dt=20230507 +dt=20230508 +dt=20230509 +dt=20230510 +snapshot +schema +manifest + +./T/snapshot: +LATEST +snapshot-2 +EARLIEST +snapshot-1 + +./T/manifest: +manifest-96739ac2-5e79-4978-a3bc-86c25f1a303f-1 # delta manifest list for snapshot-2 +manifest-96739ac2-5e79-4978-a3bc-86c25f1a303f-0 # base manifest list for snapshot-2 +manifest-f1267033-e246-4470-a54c-5c27fdbdd074-0 # manifest file for snapshot-2 + +manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-1 # delta manifest list for snapshot-1 +manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-0 # base manifest list for snapshot-1 +manifest-2b833ea4-d7dc-4de0-ae0d-ad76eced75cc-0 # manifest file for snapshot-1 + +./T/dt=20230501/bucket-0: +data-b75b7381-7c8b-430f-b7e5-a204cb65843c-0.orc + +... +# each partition has the data written to bucket-0 +... + +./T/schema: +schema-0 +``` +The new file layout as of snapshot-2 looks like +{{< img src="/img/small-file-1.png">}} + +## Delete Records From Paimon Table + +Now let's delete records that meet the condition `dt>=20230503`. +In Flink SQL, execute the following statement: + +```sql +DELETE FROM T WHERE dt >= '20230503'; +``` +The third `commit` takes place and it gives use `snapshot-3`. Now, listing the files +under the table and your will find out no partition is dropped. Instead, a new data +file is created for partition `20230503` to `20230510`: + +```bash +./T/dt=20230510/bucket-0: +data-b93f468c-b56f-4a93-adc4-b250b3aa3462-0.orc # newer data file created by the delete statement +data-0fcacc70-a0cb-4976-8c88-73e92769a762-0.orc # older data file created by the insert statement +``` + +This make sense since we insert a record in the second commit (represented by +`+I[10, 10010, 'varchar00010', '20230510']`) and then delete +the record in the third commit. Executing `SELECT * FROM T` will return 2 rows, namely: +``` ++I[1, 10001, 'varchar00001', '20230501'] ++I[2, 10002, 'varchar00002', '20230502'] +``` + +The new file layout as of snapshot-3 looks like +{{< img src="/img/small-file-2.png">}} + +Note that `manifest-3-0` contains 8 manifest entries of `ADD` operation type, +corresponding to 8 newly written data files. + +## Alter Paimon Table +As you may have noticed, the number of small files will augment over successive +snapshots, which may lead to decreased read performance. Therefore, a full-compaction +is needed in order to reduce the number of small files. + +Execute the following statement to configure full-compaction: +```sql +ALTER TABLE T SET ('full-compaction.delta-commits' = '1'); +``` + +It will create a new schema for Paimon table, namely `schema-1`, but no snapshot +has actually used this schema yet until the next commit. + +This configuration will ensure that partitions are full compacted before writing +ends, and since we haven't done any compaction yet, the next commit will produce +two snapshots, one for data written and one for full-compaction. However, we will +not use this configuration since Flink does not support running compaction in Flink SQL. + +Let's trigger the full-compaction now. Make sure you have set execution mode to `batch` Review Comment: A new title? `## Compact Table` We can move this `Compact Table` before `Alter Table`. ########## docs/content/maintenance/small-files.md: ########## @@ -0,0 +1,316 @@ +--- +title: "A Deep-Dive into Paimon's Small Files" +weight: 2 +type: docs +aliases: +- /maintenance/read-performance.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> +# A Deep-Dive into Paimon's Small Files + +Paimon table relies heavily on file management mechanisms such as +checkpoint and snapshot expiration, but users are often left wondering +about the origin of the numerous small files that result from these +operations. In response, we have created a dedicated page to explain +the sources of these files and offer tips on how to manage them effectively. +Using concrete examples, we will delve into the details of how different +operations such as commit and compact can create small files and provide +insights into their updates. + + +## Prerequisite + +Before go deeper in to this page, make sure you have read through the +[Basic Concepts]({{< ref "concepts/basic-concepts" >}}), +[File Layouts]({{< ref "concepts/file-layouts" >}}) and +how to use Paimon in [Flink]({{< ref "engines/flink" >}}). + +{{< img src="/img/file-layout.png">}} + + +## Create Paimon Catalog +Start Flink SQL client via `./sql-client.sh` and execute the following +statements one by one to create a Paimon catalog. +```sql +CREATE CATALOG paimon WITH ( +'type' = 'paimon', +'warehouse' = 'file:///tmp/paimon' +); + +USE CATALOG paimon; +``` + +This will only create a directory at given path `file:///tmp/paimon`. + +## Create Paimon Table + +Execute the following create table statement will create a Paimon table with 3 fields: + +```sql +CREATE TABLE T ( + id BIGINT, + a INT, + b STRING, + dt STRING COMMENT 'timestamp string in format yyyyMMdd', + PRIMARY KEY(id, dt) NOT ENFORCED +) PARTITIONED BY (dt); +``` + +This will create Paimon table `T` under the path `/tmp/paimon/default.db/T`, +with its schema stored in `/tmp/paimon/default.db/T/schema/schema-0` + + +## Insert Records Into Paimon Table + +Run the following insert statement in Flink SQL: + +```sql +INSERT INTO T VALUES (1, 10001, 'varchar00001', '20230501'); +``` + +After the Flink job finishes, the records are written into Paimon table, which +is done by a successful `commit`. The records are visible to user +as can be verified by `SELECT * FROM T` which return a single row. +The commit creates a snapshot under path `/tmp/paimon/default.db/T/snapshot/snapshot-1`. +The resulting file layout as of snapshot-1 is as follows: + +{{< img src="/img/small-file-0.png">}} + +The content of snapshot-1 contains metadata of the snapshot, such as manifest list and schema id: +```json +{ + "version" : 3, + "id" : 1, + "schemaId" : 0, + "baseManifestList" : "manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-0", + "deltaManifestList" : "manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-1", + "changelogManifestList" : null, + "commitUser" : "7d758485-981d-4b1a-a0c6-d34c3eb254bf", + "commitIdentifier" : 9223372036854775807, + "commitKind" : "APPEND", + "timeMillis" : 1684155393354, + "logOffsets" : { }, + "totalRecordCount" : 1, + "deltaRecordCount" : 1, + "changelogRecordCount" : 0, + "watermark" : -9223372036854775808 +} +``` + +Remind that a manifest list contains all changes of the snapshot, `baseManifestList` is the base +file upon which the changes in `deltaManifestList` is applied. +The first commit will result in 1 manifest file, and 2 manifest lists are +created (the file names might differ from those in your experiment): + +```bash +./T/manifest: +manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-1 +manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-0 +manifest-2b833ea4-d7dc-4de0-ae0d-ad76eced75cc-0 +``` +`manifest-2b833ea4-d7dc-4de0-ae0d-ad76eced75cc-0` is the manifest +file (manifest-1-0 in the above graph), which stores the information about the data files in the snapshot. + +`manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-0` is the +baseManifestList (manifest-list-1-base in the above graph), which is effectively empty. + +`manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-1` is the +deltaManifestList (manifest-list-1-delta in the above graph), which +contains a list of manifest entries that perform operations on data +files, which, in this case, is `manifest-B-0`. + + +Now let's insert a batch of records across different partitions and +see what happens. In Flink SQL, execute the following statement: + +```sql +INSERT INTO T VALUES +(2, 10002, 'varchar00002', '20230502'), +(3, 10003, 'varchar00003', '20230503'), +(4, 10004, 'varchar00004', '20230504'), +(5, 10005, 'varchar00005', '20230505'), +(6, 10006, 'varchar00006', '20230506'), +(7, 10007, 'varchar00007', '20230507'), +(8, 10008, 'varchar00008', '20230508'), +(9, 10009, 'varchar00009', '20230509'), +(10, 10010, 'varchar00010', '20230510'); +``` + +The second `commit` takes place and executing `SELECT * FROM T` will return +10 rows. A new snapshot, namely `snapshot-2`, is created and gives us the +following physical file layout: +```bash + % ls -atR . +T . .. + +./T: +dt=20230501 +dt=20230502 +dt=20230503 +dt=20230504 +dt=20230505 +dt=20230506 +dt=20230507 +dt=20230508 +dt=20230509 +dt=20230510 +snapshot +schema +manifest + +./T/snapshot: +LATEST +snapshot-2 +EARLIEST +snapshot-1 + +./T/manifest: +manifest-96739ac2-5e79-4978-a3bc-86c25f1a303f-1 # delta manifest list for snapshot-2 +manifest-96739ac2-5e79-4978-a3bc-86c25f1a303f-0 # base manifest list for snapshot-2 +manifest-f1267033-e246-4470-a54c-5c27fdbdd074-0 # manifest file for snapshot-2 + +manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-1 # delta manifest list for snapshot-1 +manifest-09184ccc-c07f-4090-958c-cfe3ce3889e5-0 # base manifest list for snapshot-1 +manifest-2b833ea4-d7dc-4de0-ae0d-ad76eced75cc-0 # manifest file for snapshot-1 + +./T/dt=20230501/bucket-0: +data-b75b7381-7c8b-430f-b7e5-a204cb65843c-0.orc + +... +# each partition has the data written to bucket-0 +... + +./T/schema: +schema-0 +``` +The new file layout as of snapshot-2 looks like +{{< img src="/img/small-file-1.png">}} + +## Delete Records From Paimon Table + +Now let's delete records that meet the condition `dt>=20230503`. +In Flink SQL, execute the following statement: + +```sql +DELETE FROM T WHERE dt >= '20230503'; +``` +The third `commit` takes place and it gives use `snapshot-3`. Now, listing the files +under the table and your will find out no partition is dropped. Instead, a new data +file is created for partition `20230503` to `20230510`: + +```bash +./T/dt=20230510/bucket-0: +data-b93f468c-b56f-4a93-adc4-b250b3aa3462-0.orc # newer data file created by the delete statement +data-0fcacc70-a0cb-4976-8c88-73e92769a762-0.orc # older data file created by the insert statement +``` + +This make sense since we insert a record in the second commit (represented by +`+I[10, 10010, 'varchar00010', '20230510']`) and then delete +the record in the third commit. Executing `SELECT * FROM T` will return 2 rows, namely: +``` ++I[1, 10001, 'varchar00001', '20230501'] ++I[2, 10002, 'varchar00002', '20230502'] +``` + +The new file layout as of snapshot-3 looks like +{{< img src="/img/small-file-2.png">}} + +Note that `manifest-3-0` contains 8 manifest entries of `ADD` operation type, +corresponding to 8 newly written data files. + +## Alter Paimon Table +As you may have noticed, the number of small files will augment over successive +snapshots, which may lead to decreased read performance. Therefore, a full-compaction +is needed in order to reduce the number of small files. + +Execute the following statement to configure full-compaction: +```sql +ALTER TABLE T SET ('full-compaction.delta-commits' = '1'); +``` + +It will create a new schema for Paimon table, namely `schema-1`, but no snapshot Review Comment: Draw a picture? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@paimon.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org