dawidwys commented on a change in pull request #13534: URL: https://github.com/apache/flink/pull/13534#discussion_r500050642
########## File path: docs/dev/table/connectors/formats/avro-confluent.md ########## @@ -0,0 +1,132 @@ +--- +title: "Confluent Avro Format" +nav-title: Confluent Avro +nav-parent_id: sql-formats +nav-pos: 3 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +<span class="label label-info">Format: Serialization Schema</span> +<span class="label label-info">Format: Deserialization Schema</span> + +* This will be replaced by the TOC +{:toc} + +The Avro Schema Registry (``avro-confluent``) format allows you to read records that were serialized by the ``io.confluent.kafka.serializers.KafkaAvroSerializer`` and to write records that can in turn be read by the ``io.confluent.kafka.serializers.KafkaAvroDeserializer``. + +When reading (deserializing) a record with this format the Avro writer schema is fetched from the configured Confluent Schema Registry based on the schema version id encoded in the record while the reader schema is inferred from table schema. + +When writing (serializing) a record with this format the Avro schema is inferred from the table schema and registered in the configured Confluent Schema Registry under the [subject](https://docs.confluent.io/current/schema-registry/index.html#schemas-subjects-and-topics) given in `avro-confluent.schema-registry.subject`. Review comment: nit: To be precise we could write something like: ```suggestion When writing (serializing) a record with this format the Avro schema is inferred from the table schema and used to retrieve a schema id to be encoded with the data. The lookup is performed with in the configured Confluent Schema Registry under the [subject](https://docs.confluent.io/current/schema-registry/index.html#schemas-subjects-and-topics) given in `avro-confluent.schema-registry.subject`. ``` But I am also fine with the current version. It's just that the operation is not CREATE, but GET_OR_CREATE. ########## File path: docs/dev/table/connectors/formats/avro-confluent.md ########## @@ -0,0 +1,132 @@ +--- +title: "Confluent Avro Format" +nav-title: Confluent Avro +nav-parent_id: sql-formats +nav-pos: 3 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +<span class="label label-info">Format: Serialization Schema</span> +<span class="label label-info">Format: Deserialization Schema</span> + +* This will be replaced by the TOC +{:toc} + +The Avro Schema Registry (``avro-confluent``) format allows you to read records that were serialized by the ``io.confluent.kafka.serializers.KafkaAvroSerializer`` and to write records that can in turn be read by the ``io.confluent.kafka.serializers.KafkaAvroDeserializer``. + +When reading (deserializing) a record with this format the Avro writer schema is fetched from the configured Confluent Schema Registry based on the schema version id encoded in the record while the reader schema is inferred from table schema. + +When writing (serializing) a record with this format the Avro schema is inferred from the table schema and registered in the configured Confluent Schema Registry under the [subject](https://docs.confluent.io/current/schema-registry/index.html#schemas-subjects-and-topics) given in `avro-confluent.schema-registry.subject`. + +The Avro Schema Registry format can only be used in conjunction with <a href="{% link dev/table/connectors/kafka.md %}"> Apache Kafka SQL connector</a> . + +Dependencies +------------ + +In order to use the Avro Schema Registry format the following dependencies are required for both projects using a build automation tool (such as Maven or SBT) and SQL Client with SQL JAR bundles. + +<div class="codetabs" markdown="1"> +<div data-lang="SQL Client JAR" markdown="1"> +You can download flink-sql-avro-confluent-registry from [Download](https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-avro-confluent-registry/{{site.version}}/flink-sql-avro-confluent-registry-{{site.version}}.jar) +</div> +<div data-lang="Maven dependency" markdown="1"> +{% highlight xml %} +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-avro-confluent-registry</artifactId> + <version>{{ site.version }}</version> +</dependency> +{% endhighlight %} +</div> +</div> Review comment: Lets do it in the same way as we do it for other connectors: ```suggestion | Maven dependency | SQL Client JAR | | :----------------------------------- | :----------------------| | `flink-avro-confluent-registry` | {% if site.is_stable %} [Download](https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-avro-confluent-registry/{{site.version}}/flink-sql-avro-confluent-registry-{{site.version}}.jar) {% else %} Only available for stable releases. {% endif %} | ``` ########## File path: docs/dev/table/connectors/formats/avro-confluent.md ########## @@ -0,0 +1,132 @@ +--- +title: "Confluent Avro Format" +nav-title: Confluent Avro +nav-parent_id: sql-formats +nav-pos: 3 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +<span class="label label-info">Format: Serialization Schema</span> +<span class="label label-info">Format: Deserialization Schema</span> + +* This will be replaced by the TOC +{:toc} + +The Avro Schema Registry (``avro-confluent``) format allows you to read records that were serialized by the ``io.confluent.kafka.serializers.KafkaAvroSerializer`` and to write records that can in turn be read by the ``io.confluent.kafka.serializers.KafkaAvroDeserializer``. + +When reading (deserializing) a record with this format the Avro writer schema is fetched from the configured Confluent Schema Registry based on the schema version id encoded in the record while the reader schema is inferred from table schema. + +When writing (serializing) a record with this format the Avro schema is inferred from the table schema and registered in the configured Confluent Schema Registry under the [subject](https://docs.confluent.io/current/schema-registry/index.html#schemas-subjects-and-topics) given in `avro-confluent.schema-registry.subject`. + +The Avro Schema Registry format can only be used in conjunction with <a href="{% link dev/table/connectors/kafka.md %}"> Apache Kafka SQL connector</a> . Review comment: ```suggestion The Avro Schema Registry format can only be used in conjunction with [Apache Kafka SQL connector]({% link dev/table/connectors/kafka.md %}) . ``` ########## File path: docs/dev/table/connectors/formats/avro-confluent.md ########## @@ -0,0 +1,132 @@ +--- +title: "Confluent Avro Format" +nav-title: Confluent Avro +nav-parent_id: sql-formats +nav-pos: 3 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +<span class="label label-info">Format: Serialization Schema</span> +<span class="label label-info">Format: Deserialization Schema</span> + +* This will be replaced by the TOC +{:toc} + +The Avro Schema Registry (``avro-confluent``) format allows you to read records that were serialized by the ``io.confluent.kafka.serializers.KafkaAvroSerializer`` and to write records that can in turn be read by the ``io.confluent.kafka.serializers.KafkaAvroDeserializer``. + +When reading (deserializing) a record with this format the Avro writer schema is fetched from the configured Confluent Schema Registry based on the schema version id encoded in the record while the reader schema is inferred from table schema. + +When writing (serializing) a record with this format the Avro schema is inferred from the table schema and registered in the configured Confluent Schema Registry under the [subject](https://docs.confluent.io/current/schema-registry/index.html#schemas-subjects-and-topics) given in `avro-confluent.schema-registry.subject`. + +The Avro Schema Registry format can only be used in conjunction with <a href="{% link dev/table/connectors/kafka.md %}"> Apache Kafka SQL connector</a> . + +Dependencies +------------ + +In order to use the Avro Schema Registry format the following dependencies are required for both projects using a build automation tool (such as Maven or SBT) and SQL Client with SQL JAR bundles. + +<div class="codetabs" markdown="1"> +<div data-lang="SQL Client JAR" markdown="1"> +You can download flink-sql-avro-confluent-registry from [Download](https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-avro-confluent-registry/{{site.version}}/flink-sql-avro-confluent-registry-{{site.version}}.jar) +</div> +<div data-lang="Maven dependency" markdown="1"> +{% highlight xml %} +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-avro-confluent-registry</artifactId> + <version>{{ site.version }}</version> +</dependency> +{% endhighlight %} +</div> +</div> + +How to create a table with Avro-Confluent format +---------------- + +Here is an example to create a table using Kafka connector and Avro format. + +<div class="codetabs" markdown="1"> +<div data-lang="SQL" markdown="1"> +{% highlight sql %} +CREATE TABLE user_behavior ( + user_id BIGINT, + item_id BIGINT, + category_id BIGINT, + behavior STRING, + ts TIMESTAMP(3) +) WITH ( + 'connector' = 'kafka', + 'properties.bootstrap.servers' = 'localhost:9092', + 'topic' = 'user_behavior' + 'format' = 'avro-confluent', + 'avro-confluent.schema-registry.url' = 'http://localhost:8081', + 'avro-confluent.schema-registry.subject' = 'user_behavior' +) +{% endhighlight %} +</div> +</div> + +Format Options +---------------- + +<table class="table table-bordered"> Review comment: I am good with this for now, as other formats/connector do that as well. At some point would be nice though to auto generate that from the `ConfigOptions`. We have the `ConfigOptions` already in place, we would probably need to add a dependency in the `flink-docs` module and replace this table with the auto generated one. ########## File path: docs/dev/table/connectors/formats/avro-confluent.md ########## @@ -0,0 +1,132 @@ +--- +title: "Confluent Avro Format" +nav-title: Confluent Avro +nav-parent_id: sql-formats +nav-pos: 3 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +<span class="label label-info">Format: Serialization Schema</span> +<span class="label label-info">Format: Deserialization Schema</span> + +* This will be replaced by the TOC +{:toc} + +The Avro Schema Registry (``avro-confluent``) format allows you to read records that were serialized by the ``io.confluent.kafka.serializers.KafkaAvroSerializer`` and to write records that can in turn be read by the ``io.confluent.kafka.serializers.KafkaAvroDeserializer``. + +When reading (deserializing) a record with this format the Avro writer schema is fetched from the configured Confluent Schema Registry based on the schema version id encoded in the record while the reader schema is inferred from table schema. + +When writing (serializing) a record with this format the Avro schema is inferred from the table schema and registered in the configured Confluent Schema Registry under the [subject](https://docs.confluent.io/current/schema-registry/index.html#schemas-subjects-and-topics) given in `avro-confluent.schema-registry.subject`. + +The Avro Schema Registry format can only be used in conjunction with <a href="{% link dev/table/connectors/kafka.md %}"> Apache Kafka SQL connector</a> . + +Dependencies +------------ + +In order to use the Avro Schema Registry format the following dependencies are required for both projects using a build automation tool (such as Maven or SBT) and SQL Client with SQL JAR bundles. + +<div class="codetabs" markdown="1"> +<div data-lang="SQL Client JAR" markdown="1"> +You can download flink-sql-avro-confluent-registry from [Download](https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-avro-confluent-registry/{{site.version}}/flink-sql-avro-confluent-registry-{{site.version}}.jar) +</div> +<div data-lang="Maven dependency" markdown="1"> +{% highlight xml %} +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-avro-confluent-registry</artifactId> + <version>{{ site.version }}</version> +</dependency> +{% endhighlight %} +</div> +</div> + +How to create a table with Avro-Confluent format +---------------- + +Here is an example to create a table using Kafka connector and Avro format. + +<div class="codetabs" markdown="1"> +<div data-lang="SQL" markdown="1"> +{% highlight sql %} +CREATE TABLE user_behavior ( + user_id BIGINT, + item_id BIGINT, + category_id BIGINT, + behavior STRING, + ts TIMESTAMP(3) +) WITH ( + 'connector' = 'kafka', + 'properties.bootstrap.servers' = 'localhost:9092', + 'topic' = 'user_behavior' + 'format' = 'avro-confluent', + 'avro-confluent.schema-registry.url' = 'http://localhost:8081', + 'avro-confluent.schema-registry.subject' = 'user_behavior' +) +{% endhighlight %} +</div> +</div> + +Format Options +---------------- + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 25%">Option</th> + <th class="text-center" style="width: 8%">Required</th> + <th class="text-center" style="width: 7%">Default</th> + <th class="text-center" style="width: 10%">Type</th> + <th class="text-center" style="width: 50%">Description</th> + </tr> + </thead> + <tbody> + <tr> + <td><h5>format</h5></td> + <td>required</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>Specify what format to use, here should be <code>'avro-confluent'</code>.</td> + </tr> + <tr> + <td><h5>avro-confluent.schema-registry.url</h5></td> + <td>required</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>The URL of the Confluent Schema Registry to fetch/register schemas</td> + </tr> + <tr> + <td><h5>avro-confluent.schema-registry.subject</h5></td> + <td>required by sink</td> + <td style="word-wrap: break-word;">(none)</td> + <td>String</td> + <td>The Confluent Schema Registry subject under which to register the schema used by this format during serialization</td> + </tr> + </tbody> +</table> + +Data Type Mapping +---------------- + +Currently, the Avro schema is always derived from table schema. Explicitly defining an Avro schema is not supported yet. +See the <a href="{% link dev/table/connectors/formats/avro.md%}#data-type-mapping"> Apache Avro Format</a> for the mapping between Avro and Flink DataTypes. Review comment: Let's stick to a single formatting: ```suggestion See the [Apache Avro Format]({% link dev/table/connectors/formats/avro.md%}#data-type-mapping) for the mapping between Avro and Flink DataTypes. ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
