This is an automated email from the ASF dual-hosted git repository. jark pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/flink.git
commit 821e786ce1e94e0074affcd50ecd4b87a6bd744b Author: Jark Wu <j...@apache.org> AuthorDate: Mon Jun 8 17:38:14 2020 +0800 [FLINK-18133][docs][avro] Add documentation for the new Avro format This closes #12523 --- docs/dev/table/connectors/formats/avro.md | 203 +++++++++++++++++++++++++ docs/dev/table/connectors/formats/avro.zh.md | 206 ++++++++++++++++++++++++++ docs/dev/table/connectors/formats/index.md | 72 +++++++++ docs/dev/table/connectors/formats/index.zh.md | 72 +++++++++ 4 files changed, 553 insertions(+) diff --git a/docs/dev/table/connectors/formats/avro.md b/docs/dev/table/connectors/formats/avro.md new file mode 100644 index 0000000..8870235 --- /dev/null +++ b/docs/dev/table/connectors/formats/avro.md @@ -0,0 +1,203 @@ +--- +title: "Avro Format" +nav-title: Avro +nav-parent_id: sql-formats +nav-pos: 3 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +<span class="label label-info">Format: Serialization Schema</span> +<span class="label label-info">Format: Deserialization Schema</span> + +* This will be replaced by the TOC +{:toc} + +The [Apache Avro](https://avro.apache.org/) format allows to read and write Avro data based on an Avro schema. Currently, the Avro schema is derived from table schema. + +Dependencies +------------ + +In order to setup the Avro format, the following table provides dependency information for both projects using a build automation tool (such as Maven or SBT) and SQL Client with SQL JAR bundles. + +| Maven dependency | SQL Client JAR | +| :----------------- | :----------------------| +| `flink-avro` | [Pre-bundled Hadoop](https://flink.apache.org/downloads.html#additional-components) | + +How to create a table with Avro format +---------------- + +Here is an example to create a table using Kafka connector and Avro format. + +<div class="codetabs" markdown="1"> +<div data-lang="SQL" markdown="1"> +{% highlight sql %} +CREATE TABLE user_behavior ( + user_id BIGINT, + item_id BIGINT, + category_id BIGINT, + behavior STRING, + ts TIMESTAMP(3), +) WITH ( + 'connector' = 'kafka', + 'topic' = 'user_behavior', + 'properties.bootstrap.servers' = 'localhost:9092', + 'properties.group.id' = 'testGroup', + 'format' = 'avro' +) +{% endhighlight %} +</div> +</div> + +Format Options +---------------- + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Option</th> + <th class="text-center" style="width: 8%">Required</th> + <th class="text-center" style="width: 7%">Default</th> + <th class="text-center" style="width: 10%">Type</th> + <th class="text-center" style="width: 50%">Description</th> + </tr> + </thead> + <tbody> + <tr> + <td><h5>format</h5></td> + <td>required</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>Specify what format to use, here should be 'avro'.</td> + </tr> + </tbody> +</table> + +Data Type Mapping +---------------- + +Currently, the Avro schema is always derived from table schema. Explicitly defining an Avro schema is not supported yet. +So the following table lists the type mapping from Flink type to Avro type. + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left">Flink Data Type</th> + <th class="text-center">Avro type</th> + <th class="text-center">Avro logical type</th> + </tr> + </thead> + <tbody> + <tr> + <td>CHAR / VARCHAR / STRING</td> + <td>string</td> + <td></td> + </tr> + <tr> + <td>BOOLEAN</td> + <td>boolean</td> + <td></td> + </tr> + <tr> + <td>BINARY / VARBINARY</td> + <td>bytes</td> + <td></td> + </tr> + <tr> + <td>DECIMAL</td> + <td>fixed</td> + <td>decimal</td> + </tr> + <tr> + <td>TINYINT</td> + <td>int</td> + <td></td> + </tr> + <tr> + <td>SMALLINT</td> + <td>int</td> + <td></td> + </tr> + <tr> + <td>INT</td> + <td>int</td> + <td></td> + </tr> + <tr> + <td>BIGINT</td> + <td>long</td> + <td></td> + </tr> + <tr> + <td>FLOAT</td> + <td>float</td> + <td></td> + </tr> + <tr> + <td>DOUBLE</td> + <td>double</td> + <td></td> + </tr> + <tr> + <td>DATE</td> + <td>int</td> + <td>date</td> + </tr> + <tr> + <td>TIME</td> + <td>int</td> + <td>time-millis</td> + </tr> + <tr> + <td>TIMESTAMP</td> + <td>long</td> + <td>timestamp-millis</td> + </tr> + <tr> + <td>ARRAY</td> + <td>array</td> + <td></td> + </tr> + <tr> + <td>MAP<br> + (key must be string/char/varchar type)</td> + <td>map</td> + <td></td> + </tr> + <tr> + <td>MULTISET<br> + (element must be string/char/varchar type)</td> + <td>map</td> + <td></td> + </tr> + <tr> + <td>ROW</td> + <td>record</td> + <td></td> + </tr> + </tbody> +</table> + +In addition to the types listed above, Flink supports reading/writing nullable types. Flink maps nullable types to Avro `union(something, null)`, where `something` is the Avro type converted from Flink type. + +You can refer to Avro Specification for more information about Avro types: [https://avro.apache.org/docs/current/spec.html](https://avro.apache.org/docs/current/spec.html). + + + + diff --git a/docs/dev/table/connectors/formats/avro.zh.md b/docs/dev/table/connectors/formats/avro.zh.md new file mode 100644 index 0000000..ed74042 --- /dev/null +++ b/docs/dev/table/connectors/formats/avro.zh.md @@ -0,0 +1,206 @@ +--- +title: "Avro Format" +nav-title: Avro +nav-parent_id: sql-formats +nav-pos: 3 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +<span class="label label-info">Format: Serialization Schema</span> +<span class="label label-info">Format: Deserialization Schema</span> + +* This will be replaced by the TOC +{:toc} + +The [Apache Avro](https://avro.apache.org/) format allows to read and write Avro data based on an Avro schema. Currently, the Avro schema is derived from table schema. + +Dependencies +------------ + +In order to setup the Avro format, the following table provides dependency information for both projects using a build automation tool (such as Maven or SBT) and SQL Client with SQL JAR bundles. + +| Maven dependency | SQL Client JAR | +| :----------------- | :----------------------| +| `flink-avro` | [Pre-bundled Hadoop](https://flink.apache.org/downloads.html#additional-components) | + +How to create a table with Avro format +---------------- + +Here is an example to create a table using Kafka connector and Avro format. + +<div class="codetabs" markdown="1"> +<div data-lang="SQL" markdown="1"> +{% highlight sql %} +CREATE TABLE user_behavior ( + user_id BIGINT, + item_id BIGINT, + category_id BIGINT, + behavior STRING, + ts TIMESTAMP(3), +) WITH ( + 'connector' = 'kafka', + 'topic' = 'user_behavior', + 'properties.bootstrap.servers' = 'localhost:9092', + 'properties.group.id' = 'testGroup', + 'format' = 'avro' +) +{% endhighlight %} +</div> +</div> + +Format Options +---------------- + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Option</th> + <th class="text-center" style="width: 8%">Required</th> + <th class="text-center" style="width: 7%">Default</th> + <th class="text-center" style="width: 10%">Type</th> + <th class="text-center" style="width: 50%">Description</th> + </tr> + </thead> + <tbody> + <tr> + <td><h5>format</h5></td> + <td>required</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>Specify what format to use, here should be 'avro'.</td> + </tr> + </tbody> +</table> + +Data Type Mapping +---------------- + +Currently, the Avro schema is always derived from table schema, explicitly define an Avro schema is not supported yet. So here only lists the conversion from Flink type to Avro type. + +### Conversion from Flink type to Avro type + +The following table lists the conversion from the supported Flink type to Avro type. + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left">Flink Data Type</th> + <th class="text-center">Avro type</th> + <th class="text-center">Avro logical type</th> + </tr> + </thead> + <tbody> + <tr> + <td>CHAR / VARCHAR / STRING</td> + <td>string</td> + <td></td> + </tr> + <tr> + <td>BOOLEAN</td> + <td>boolean</td> + <td></td> + </tr> + <tr> + <td>BINARY / VARBINARY</td> + <td>bytes</td> + <td></td> + </tr> + <tr> + <td>DECIMAL</td> + <td>fixed</td> + <td>decimal</td> + </tr> + <tr> + <td>TINYINT</td> + <td>int</td> + <td></td> + </tr> + <tr> + <td>SMALLINT</td> + <td>int</td> + <td></td> + </tr> + <tr> + <td>INT</td> + <td>int</td> + <td></td> + </tr> + <tr> + <td>BIGINT</td> + <td>long</td> + <td></td> + </tr> + <tr> + <td>FLOAT</td> + <td>float</td> + <td></td> + </tr> + <tr> + <td>DOUBLE</td> + <td>double</td> + <td></td> + </tr> + <tr> + <td>DATE</td> + <td>int</td> + <td>date</td> + </tr> + <tr> + <td>TIME</td> + <td>int</td> + <td>time-millis</td> + </tr> + <tr> + <td>TIMESTAMP</td> + <td>long</td> + <td>timestamp-millis</td> + </tr> + <tr> + <td>ARRAY</td> + <td>array</td> + <td></td> + </tr> + <tr> + <td>MAP<br> + (key must be string/char/varchar type)</td> + <td>map</td> + <td></td> + </tr> + <tr> + <td>MULTISET<br> + (element must be string/char/varchar type)</td> + <td>map</td> + <td></td> + </tr> + <tr> + <td>ROW</td> + <td>record</td> + <td></td> + </tr> + </tbody> +</table> + +In addition to the types listed above, Flink supports reading/writing nullable types. Flink maps nullable types to Avro `union(something, null)`, where `something` is the Avro type converted from Flink type. + +You can refer to Avro Specification for more information about Avro types: [https://avro.apache.org/docs/current/spec.html](https://avro.apache.org/docs/current/spec.html). + + + + diff --git a/docs/dev/table/connectors/formats/index.md b/docs/dev/table/connectors/formats/index.md new file mode 100644 index 0000000..6f45d74 --- /dev/null +++ b/docs/dev/table/connectors/formats/index.md @@ -0,0 +1,72 @@ +--- +title: "Formats" +nav-id: sql-formats +nav-parent_id: sql-connectors +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink provides a set of table formats that can be used with table connectors. A table format is a storage format defines how to map binary data onto table columns. + +Flink supports the following formats: + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left">Formats</th> + <th class="text-left">Supported Connectors</th> + </tr> + </thead> + <tbody> + <tr> + <td>CSV</td> + <td>Apache Kafka, + <a href="{{ site.baseurl }}/dev/table/connectors/filesystem.html">Filesystem</a></td> + </tr> + <tr> + <td>JSON</td> + <td>Apache Kafka, + <a href="{{ site.baseurl }}/dev/table/connectors/filesystem.html">Filesystem</a>, + Elasticsearch</td> + </tr> + <tr> + <td><a href="{{ site.baseurl }}/dev/table/connectors/formats/avro.html">Apache Avro</a></td> + <td>Apache Kafka, + <a href="{{ site.baseurl }}/dev/table/connectors/filesystem.html">Filesystem</a></td> + </tr> + <tr> + <td>Debezium JSON</td> + <td>Apache Kafka</td> + </tr> + <tr> + <td>Canal JSON</td> + <td>Apache Kafka</td> + </tr> + <tr> + <td>Apache Parquet</td> + <td><a href="{{ site.baseurl }}/dev/table/connectors/filesystem.html">Filesystem</a></td> + </tr> + <tr> + <td>Apache ORC</td> + <td><a href="{{ site.baseurl }}/dev/table/connectors/filesystem.html">Filesystem</a></td> + </tr> + </tbody> +</table> \ No newline at end of file diff --git a/docs/dev/table/connectors/formats/index.zh.md b/docs/dev/table/connectors/formats/index.zh.md new file mode 100644 index 0000000..9272f21 --- /dev/null +++ b/docs/dev/table/connectors/formats/index.zh.md @@ -0,0 +1,72 @@ +--- +title: "Formats" +nav-id: sql-formats +nav-parent_id: sql-connectors +nav-pos: 1 +nav-show_overview: true +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +Flink provides a set of table formats that can be used with table connectors. A table format is a storage format defines how to map binary data onto table columns. + +Flink supports the following formats: + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left">Formats</th> + <th class="text-left">Supported Connectors</th> + </tr> + </thead> + <tbody> + <tr> + <td>CSV</td> + <td><a href="{{ site.baseurl }}/dev/table/connectors/kafka.html">Apache Kafka</a>, + <a href="{{ site.baseurl }}/dev/table/connectors/filesystem.html">Filesystem</a></td> + </tr> + <tr> + <td>JSON</td> + <td><a href="{{ site.baseurl }}/dev/table/connectors/kafka.html">Apache Kafka</a>, + <a href="{{ site.baseurl }}/dev/table/connectors/filesystem.html">Filesystem</a>, + <a href="{{ site.baseurl }}/dev/table/connectors/elasticsearch.html">Elasticsearch</a></td> + </tr> + <tr> + <td><a href="{{ site.baseurl }}/dev/table/connectors/formats/avro.html">Apache Avro</a></td> + <td><a href="{{ site.baseurl }}/dev/table/connectors/kafka.html">Apache Kafka</a>, + <a href="{{ site.baseurl }}/dev/table/connectors/filesystem.html">Filesystem</a></td> + </tr> + <tr> + <td>Debezium JSON</td> + <td><a href="{{ site.baseurl }}/dev/table/connectors/kafka.html">Apache Kafka</a></td> + </tr> + <tr> + <td>Canal JSON</td> + <td><a href="{{ site.baseurl }}/dev/table/connectors/kafka.html">Apache Kafka</a></td> + </tr> + <tr> + <td>Apache Parquet</td> + <td><a href="{{ site.baseurl }}/dev/table/connectors/filesystem.html">Filesystem</a></td> + </tr> + <tr> + <td>Apache ORC</td> + <td><a href="{{ site.baseurl }}/dev/table/connectors/filesystem.html">Filesystem</a></td> + </tr> + </tbody> +</table> \ No newline at end of file