[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

GitBox Sun, 03 Apr 2022 00:55:55 -0700


LadyForest commented on a change in pull request #72:
URL: https://github.com/apache/flink-table-store/pull/72#discussion_r841181591




##########
File path: docs/content/docs/development/write-table.md
##########
@@ -0,0 +1,190 @@
+---
+title: "Write Table"
+weight: 3
+type: docs
+aliases:
+- /development/write-table.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Write Table
+
+```sql
+INSERT { INTO | OVERWRITE } [catalog_name.][db_name.]table_name
+  [PARTITION part_spec] [column_list] select_statement
+
+part_spec:
+  (part_col_name1=val1 [, part_col_name2=val2, ...])
+
+column_list:
+  (col_name1 [, column_name2, ...])
+```
+
+## Unify Streaming and Batch

Review comment:
       How about =>
   
   
   Flink Table Store supports read/write under both batch and streaming mode. 
Beyond that, it can also write to the same managed table simultaneously by 
different streaming and batch tasks.
   
   Suppose you have a partitioned table defined as
   ```sql
   -- A managed table DDL
   CREATE TABLE MyTable (
     user_id BIGINT,
     item_id BIGINT,
     dt STRING
   ) PARTITIONED BY (dt);
   ```
   And there is a real-time pipeline to perform the data sync task, followed by 
the downstream jobs to perform the rest ETL steps.
   ```sql
   -- Run a streaming job that continuously writes to the table
   SET 'execution.runtime-mode' = 'streaming';
   INSERT INTO MyDwdTable SELECT user_id, item_id FROM MyCdcTable WHERE 
some_filter;
   
   -- The downstream aggregation task
   INSERT INTO MyDwsTable 
   SELECT dt, item_id, COUNT(user_id) FROM MyDwdTable GROUP BY dt, item_id;
   ```
   Some backfill tasks are often required to correct historical data, which 
means you can start a new batch job overwriting the table's historical 
partition without influencing the current streaming pipeline and the downstream 
tasks.
   ```sql
   -- Run a batch job to revise yesterday's partition
   SET 'execution.runtime-mode' = 'batch';
   INSERT OVERWRITE MyDwdTable PARTITION ('dt'='20220402')
   SELECT user_id, item_id FROM MyCdcTable WHERE dt = '20220402' AND new_filter;
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] LadyForest commented on a change in pull request #72: [FLINK-26899] Introduce write/query table document for table store

Reply via email to