Gerrrr commented on code in PR #443:
URL: https://github.com/apache/flink-table-store/pull/443#discussion_r1053789276


##########
docs/content/docs/features/table-types.md:
##########
@@ -0,0 +1,142 @@
+---
+title: "Table Types"
+weight: 1
+type: docs
+aliases:
+- /features/table-types.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Table Types
+
+Table Store supports various types of tables. Users can specify `write-mode` 
table property to specify table types when creating tables.
+
+## Changelog Tables with Primary Keys
+
+Changelog table is the default table type when creating a table. Users can 
also specify `'write-mode' = 'change-log'` explicitly in table properties when 
creating the table.
+
+Primary keys are a set of columns that are unique for each record. Table Store 
imposes an ordering of data, which means the system will sort the primary key 
within each bucket. Using this feature, users can achieve high performance by 
adding filter conditions on the primary key.
+
+By [defining primary keys]({{< ref 
"docs/sql-api/creating-tables#tables-with-primary-keys" >}}) on a changelog 
table, users can access the following features.
+
+### Merge Engines
+
+When Table Store sink receives two or more records with the same primary keys, 
it will merge them into one record to keep primary keys unique. By specifying 
the `merge-engine` table property, users can choose how records are merged 
together.
+
+#### Deduplicate
+
+`deduplicate` merge engine is the default merge engine. Table Store will only 
keep the latest record and throw away other records with the same primary keys.
+
+Specifically, if the latest record is a `DELETE` record, all records with the 
same primary keys will be deleted.
+
+#### Partial Update
+
+By specifying `'merge-engine' = 'partial-update'`, users can set columns of a 
record across multiple updates and finally get a complete record. Specifically, 
value fields are updated to the latest data one by one under the same primary 
key, but null values are not overwritten.
+
+For example, let's say Table Store receives three records `<1, 23.0, 10, 
NULL>`, `<1, NULL, NULL, 'This is a book'>` and `<1, 25.2, NULL, NULL>`, where 
the first column is the primary key. The final result will be `<1, 25.2, 10, 
'This is a book'>`.
+
+NOTE: For streaming queries, `partial-update` merge engine must be used 
together with `full-compaction` [changelog producer]({{< ref 
"docs/features/table-types#changelog-producers" >}}).
+
+#### Aggregation
+
+Sometimes users only care about aggregated results. The `aggregation` merge 
engine aggregates each value field with the latest data one by one under the 
same primary key according to the aggregate function.
+
+Each field not part of the primary keys must be given an aggregate function, 
specified by the `fields.<field-name>.aggregate-function` table property. For 
example, consider the following table definition.
+
+{{< tabs "aggregation-merge-engine-example" >}}
+
+{{< tab "Flink" >}}
+
+```sql
+CREATE TABLE MyTable (
+    product_id BIGINT,
+    price DOUBLE,
+    sales BIGINT,
+    PRIMARY KEY (product_id) NOT ENFORCED
+) WITH (
+    'merge-engine' = 'aggregation',
+    'fields.price.aggregate-function' = 'max',
+    'fields.sales.aggregate-function' = 'sum'
+);
+```
+
+{{< /tab >}}
+
+{{< /tabs >}}
+
+Field `price` will be aggregated by the `max` function, and field `sales` will 
be aggregated by the `sum` function. Given two input records `<1, 23.0, 15>` 
and `<1, 30.2, 20>`, the final result will be `<1, 30.2, 35>`.
+
+Current supported aggregate functions are data types are:
+
+* `sum`: supports DECIMAL, TINYINT, SMALLINT, INTEGER, BIGINT, FLOAT and 
DOUBLE.
+* `min`/`max`: support DECIMAL, TINYINT, SMALLINT, INTEGER, BIGINT, FLOAT, 
DOUBLE, DATE, TIME, TIMESTAMP and TIMESTAMP_LTZ.
+* `last_value` / `last_non_null_value`: support all data types.
+* `listagg`: supports STRING data type.
+* `bool_and` / `bool_or`: support BOOLEAN data type.
+
+### Changelog Producers
+
+Streaming queries will continuously produce latest changes. These changes can 
come from the underlying table files or from an [external log system]({{< ref 
"docs/features/external-log-systems" >}}) like Kafka. Compared to the external 
log system, changes from table files have lower cost but higher latency 
(depending on how often snapshots are created).
+
+By specifying the `changelog-producer` table property when creating the table, 
users can choose the pattern of changes produced from files.
+
+#### None
+
+By default, no extra changelog producer will be applied to the writer of 
table. Table Store source can only see the merged changes across snapshots, 
like what keys are removed and what are the new values of some keys.
+
+However, these merged changes cannot form a complete changelog, because we 
can't read the old values of the keys directly from them. Merged changes 
require the consumers to "remember" the values of each key and to rewrite the 
values without seeing the old ones.

Review Comment:
   Why do we need `before` values in the complete changelog in the general 
case? AFAIU we'd only need them for append-only tables.



##########
docs/content/docs/features/table-types.md:
##########
@@ -0,0 +1,142 @@
+---
+title: "Table Types"
+weight: 1
+type: docs
+aliases:
+- /features/table-types.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Table Types
+
+Table Store supports various types of tables. Users can specify `write-mode` 
table property to specify table types when creating tables.
+
+## Changelog Tables with Primary Keys
+
+Changelog table is the default table type when creating a table. Users can 
also specify `'write-mode' = 'change-log'` explicitly in table properties when 
creating the table.

Review Comment:
   It might be worth elaborating that `'write-mode' = 'change-log'` generally 
means that the table supports inserts/updates/deletions. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to