[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

GitBox Sat, 02 Apr 2022 21:16:17 -0700


LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841157612




##########
File path: docs/content/docs/development/write-table.md
##########
@@ -0,0 +1,190 @@
+---
+title: "Write Table"
+weight: 3
+type: docs
+aliases:
+- /development/write-table.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Write Table
+
+```sql
+INSERT { INTO | OVERWRITE } [catalog_name.][db_name.]table_name
+  [PARTITION part_spec] [column_list] select_statement
+
+part_spec:
+  (part_col_name1=val1 [, part_col_name2=val2, ...])
+
+column_list:
+  (col_name1 [, column_name2, ...])
+```
+
+## Unify Streaming and Batch
+
+Table Store data writing supports Flink's batch and streaming modes.
+Moreover, Table Store supports simultaneous writing of streaming and batch:
+
+```sql
+-- a managed table ddl
+CREATE TABLE MyTable (
+  user_id BIGINT,
+  item_id BIGINT,
+  dt STRING
+) PARTITIONED BY (dt);
+
+-- Run a stream job that continuously writes to the table
+SET 'execution.runtime-mode' = 'streaming';
+INSERT INTO MyTable SELECT ... FROM MyCdcTable;
+```
+
+You have this partitioned table, and now there is a stream job that
+is continuously writing data.
+
+After a few days, you find that there is a problem with yesterday's
+partition data, the data in the partition is wrong, you need to
+recalculate and revise the partition.
+
+```sql
+-- Run a batch job to revise yesterday's partition
+SET 'execution.runtime-mode' = 'batch';
+INSERT OVERWRITE MyTable PARTITION ('dt'='20220402')
+  SELECT ... FROM SourceTable WHERE dt = '20220402';
+```
+
+This way you revise yesterday's partition without suspending the streaming job.
+
+{{< hint info >}}
+__Note:__ Multiple jobs writing to a single partition at the same time is
+not supported. The behavior does not result in data errors, but can lead
+to job failover.
+{{< /hint >}}
+
+## Parallelism
+
+It is recommended that the parallelism of sink should be less than or
+equal to the number of buckets, preferably equal. You can control the
+parallelism of the sink with the `sink.parallelism` option.
+
+<table class="table table-bordered">
+    <thead>
+    <tr>
+      <th class="text-left" style="width: 20%">Option</th>
+      <th class="text-center" style="width: 5%">Required</th>
+      <th class="text-center" style="width: 5%">Default</th>
+      <th class="text-center" style="width: 10%">Type</th>
+      <th class="text-center" style="width: 60%">Description</th>
+    </tr>
+    </thead>
+    <tbody>
+    <tr>
+      <td><h5>sink.parallelism</h5></td>
+      <td>No</td>
+      <td style="word-wrap: break-word;">(none)</td>
+      <td>Integer</td>
+      <td>Defines the parallelism of the sink operator. By default, the 
parallelism is determined by the framework using the same parallelism of the 
upstream chained operator.</td>
+    </tr>
+    </tbody>
+</table>
+
+## Expire Snapshot
+
+Table Store generates one or two snapshots per commit. To avoid too many 
snapshots
+that create a lot of small files and redundant storage, Table Store write 
defaults
+to eliminating expired snapshots, controlled by the following options:

Review comment:
       Table Store generates one or two snapshots per commit. To avoid too many 
snapshots that create a lot of small files and redundant storage, Table Store 
writes defaults to eliminate expired snapshots




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

Reply via email to