MartijnVisser commented on a change in pull request #17640: URL: https://github.com/apache/flink/pull/17640#discussion_r742836114
########## File path: docs/content/docs/connectors/datastream/formats/avro.md ########## @@ -0,0 +1,61 @@ +--- +title: "Avro" +weight: 4 +type: docs +aliases: +- /dev/connectors/formats/avro.html +- /apis/streaming/connectors/formats/avro.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + + +# Avro formats Review comment: The SQL page called it `AVRO Format` which I think is a little bit better. ########## File path: docs/content/docs/connectors/datastream/formats/azure_table_storage.md ########## @@ -0,0 +1,130 @@ +--- +title: "Microsoft Azure table" +weight: 4 +type: docs +aliases: +- /dev/connectors/formats/azure_table_storage.html +- /apis/streaming/connectors/formats/azure_table_storage.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Microsoft Azure Table Storage format + +_Note: This example works starting from Flink 0.6-incubating_ + +This example is using the `HadoopInputFormat` wrapper to use an existing Hadoop input format implementation for accessing [Azure's Table Storage](https://azure.microsoft.com/en-us/documentation/articles/storage-introduction/). + +1. Download and compile the `azure-tables-hadoop` project. The input format developed by the project is not yet available in Maven Central, therefore, we have to build the project ourselves. + Execute the following commands: + +```bash +git clone https://github.com/mooso/azure-tables-hadoop.git +cd azure-tables-hadoop +mvn clean install +``` + +2. Setup a new Flink project using the quickstarts: + +```bash +curl https://flink.apache.org/q/quickstart.sh | bash +``` + +3. Add the following dependencies (in the `<dependencies>` section) to your `pom.xml` file: + +```xml +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-hadoop-compatibility{{< scala_version >}}</artifactId> + <version>{{< version >}}</version> +</dependency> +<dependency> + <groupId>com.microsoft.hadoop</groupId> + <artifactId>microsoft-hadoop-azure</artifactId> + <version>0.0.4</version> +</dependency> +``` + +`flink-hadoop-compatibility` is a Flink package that provides the Hadoop input format wrappers. +`microsoft-hadoop-azure` is adding the project we've build before to our project. + +The project is now prepared for starting to code. We recommend to import the project into an IDE, such as Eclipse or IntelliJ. (Import as a Maven project!). Review comment: ```suggestion The project is now ready for starting to code. We recommend to import the project into an IDE, such as IntelliJ. You should import it as a Maven project. ``` ########## File path: docs/content/docs/connectors/datastream/formats/azure_table_storage.md ########## @@ -0,0 +1,130 @@ +--- +title: "Microsoft Azure table" +weight: 4 +type: docs +aliases: +- /dev/connectors/formats/azure_table_storage.html +- /apis/streaming/connectors/formats/azure_table_storage.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Microsoft Azure Table Storage format + +_Note: This example works starting from Flink 0.6-incubating_ + +This example is using the `HadoopInputFormat` wrapper to use an existing Hadoop input format implementation for accessing [Azure's Table Storage](https://azure.microsoft.com/en-us/documentation/articles/storage-introduction/). + +1. Download and compile the `azure-tables-hadoop` project. The input format developed by the project is not yet available in Maven Central, therefore, we have to build the project ourselves. + Execute the following commands: + +```bash +git clone https://github.com/mooso/azure-tables-hadoop.git +cd azure-tables-hadoop +mvn clean install +``` + +2. Setup a new Flink project using the quickstarts: + +```bash +curl https://flink.apache.org/q/quickstart.sh | bash +``` + +3. Add the following dependencies (in the `<dependencies>` section) to your `pom.xml` file: + +```xml +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-hadoop-compatibility{{< scala_version >}}</artifactId> + <version>{{< version >}}</version> +</dependency> +<dependency> + <groupId>com.microsoft.hadoop</groupId> + <artifactId>microsoft-hadoop-azure</artifactId> + <version>0.0.4</version> +</dependency> +``` + +`flink-hadoop-compatibility` is a Flink package that provides the Hadoop input format wrappers. +`microsoft-hadoop-azure` is adding the project we've build before to our project. + +The project is now prepared for starting to code. We recommend to import the project into an IDE, such as Eclipse or IntelliJ. (Import as a Maven project!). +Browse to the code of the `Job.java` file. Its an empty skeleton for a Flink job. + +Paste the following code into it: Review comment: ```suggestion Paste the following code: ``` ########## File path: docs/content/docs/connectors/datastream/formats/parquet.md ########## @@ -0,0 +1,67 @@ +--- +title: "Parquet" +weight: 4 +type: docs +aliases: +- /dev/connectors/formats/parquet.html +- /apis/streaming/connectors/formats/parquet.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + + +# Parquet formats + +Flink has extensive built-in support for [Apache Parquet](http://parquet.apache.org/). This allows to easily read from Parquet files with Flink. +Be sure to include the Flink Parquet dependency to the pom.xml of your project. Review comment: ```suggestion In order to use the Parquet format the following dependencies are required for projects using a build automation tool (such as Maven or SBT). ``` ########## File path: docs/content/docs/connectors/datastream/formats/avro.md ########## @@ -0,0 +1,61 @@ +--- +title: "Avro" +weight: 4 +type: docs +aliases: +- /dev/connectors/formats/avro.html +- /apis/streaming/connectors/formats/avro.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + + +# Avro formats + +Flink has extensive built-in support for [Apache Avro](http://avro.apache.org/). This allows to easily read from Avro files with Flink. +Also, the serialization framework of Flink is able to handle classes generated from Avro schemas. Be sure to include the Flink Avro dependency to the pom.xml of your project. Review comment: ```suggestion The serialization framework of Flink is able to handle classes generated from Avro schemas. In order to use the Avro format the following dependencies are required for projects using a build automation tool (such as Maven or SBT). ``` ########## File path: docs/content/docs/connectors/datastream/formats/azure_table_storage.md ########## @@ -0,0 +1,130 @@ +--- +title: "Microsoft Azure table" +weight: 4 +type: docs +aliases: +- /dev/connectors/formats/azure_table_storage.html +- /apis/streaming/connectors/formats/azure_table_storage.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Microsoft Azure Table Storage format + +_Note: This example works starting from Flink 0.6-incubating_ Review comment: I don't think we need to include this note, since we don't support Flink 0.6 anymore (and the documentation is specifically targeted towards Flink 1.15) ########## File path: docs/content/docs/connectors/datastream/formats/azure_table_storage.md ########## @@ -0,0 +1,130 @@ +--- +title: "Microsoft Azure table" +weight: 4 +type: docs +aliases: +- /dev/connectors/formats/azure_table_storage.html +- /apis/streaming/connectors/formats/azure_table_storage.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Microsoft Azure Table Storage format + +_Note: This example works starting from Flink 0.6-incubating_ + +This example is using the `HadoopInputFormat` wrapper to use an existing Hadoop input format implementation for accessing [Azure's Table Storage](https://azure.microsoft.com/en-us/documentation/articles/storage-introduction/). + +1. Download and compile the `azure-tables-hadoop` project. The input format developed by the project is not yet available in Maven Central, therefore, we have to build the project ourselves. + Execute the following commands: + +```bash +git clone https://github.com/mooso/azure-tables-hadoop.git +cd azure-tables-hadoop +mvn clean install +``` + +2. Setup a new Flink project using the quickstarts: + +```bash +curl https://flink.apache.org/q/quickstart.sh | bash +``` + +3. Add the following dependencies (in the `<dependencies>` section) to your `pom.xml` file: + +```xml +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-hadoop-compatibility{{< scala_version >}}</artifactId> + <version>{{< version >}}</version> +</dependency> +<dependency> + <groupId>com.microsoft.hadoop</groupId> + <artifactId>microsoft-hadoop-azure</artifactId> + <version>0.0.4</version> +</dependency> +``` + +`flink-hadoop-compatibility` is a Flink package that provides the Hadoop input format wrappers. +`microsoft-hadoop-azure` is adding the project we've build before to our project. + +The project is now prepared for starting to code. We recommend to import the project into an IDE, such as Eclipse or IntelliJ. (Import as a Maven project!). +Browse to the code of the `Job.java` file. Its an empty skeleton for a Flink job. Review comment: ```suggestion Browse to the file `Job.java`. This is an empty skeleton for a Flink job. ``` ########## File path: docs/content/docs/connectors/datastream/formats/avro.md ########## @@ -0,0 +1,61 @@ +--- +title: "Avro" +weight: 4 +type: docs +aliases: +- /dev/connectors/formats/avro.html +- /apis/streaming/connectors/formats/avro.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + + +# Avro formats + +Flink has extensive built-in support for [Apache Avro](http://avro.apache.org/). This allows to easily read from Avro files with Flink. Review comment: ```suggestion Flink has built-in support for [Apache Avro](http://avro.apache.org/). This allows to easily read and write Avro data based on an Avro schema with Flink. ``` ########## File path: docs/content/docs/connectors/datastream/formats/parquet.md ########## @@ -0,0 +1,67 @@ +--- +title: "Parquet" +weight: 4 +type: docs +aliases: +- /dev/connectors/formats/parquet.html +- /apis/streaming/connectors/formats/parquet.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + + +# Parquet formats Review comment: ```suggestion # Parquet format ``` ########## File path: docs/content/docs/connectors/datastream/formats/azure_table_storage.md ########## @@ -0,0 +1,130 @@ +--- +title: "Microsoft Azure table" +weight: 4 +type: docs +aliases: +- /dev/connectors/formats/azure_table_storage.html +- /apis/streaming/connectors/formats/azure_table_storage.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Microsoft Azure Table Storage format + +_Note: This example works starting from Flink 0.6-incubating_ + +This example is using the `HadoopInputFormat` wrapper to use an existing Hadoop input format implementation for accessing [Azure's Table Storage](https://azure.microsoft.com/en-us/documentation/articles/storage-introduction/). + +1. Download and compile the `azure-tables-hadoop` project. The input format developed by the project is not yet available in Maven Central, therefore, we have to build the project ourselves. + Execute the following commands: + +```bash +git clone https://github.com/mooso/azure-tables-hadoop.git +cd azure-tables-hadoop +mvn clean install +``` + +2. Setup a new Flink project using the quickstarts: + +```bash +curl https://flink.apache.org/q/quickstart.sh | bash +``` + +3. Add the following dependencies (in the `<dependencies>` section) to your `pom.xml` file: + +```xml +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-hadoop-compatibility{{< scala_version >}}</artifactId> + <version>{{< version >}}</version> +</dependency> +<dependency> + <groupId>com.microsoft.hadoop</groupId> + <artifactId>microsoft-hadoop-azure</artifactId> + <version>0.0.4</version> Review comment: Nit: the ident is slightly different from the one above. ########## File path: docs/content/docs/connectors/datastream/formats/hadoop.md ########## @@ -0,0 +1,38 @@ +--- +title: "Hadoop" +weight: 4 +type: docs +aliases: + - /dev/connectors/formats/hadoop.html + - /apis/streaming/connectors/formats/hadoop.html + +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Hadoop formats + +Apache Flink allows users to access many different systems as data sources. +The system is designed for very easy extensibility. Similar to Apache Hadoop, Flink has the concept +of so called `InputFormat`s + +One implementation of these `InputFormat`s is the `HadoopInputFormat`. This is a wrapper that allows +users to use all existing Hadoop input formats with Flink. + +{{< top >}} Review comment: Woud it make sense to move the documentation from https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/dataset/hadoop_compatibility/#complete-hadoop-wordcount-example to this page? ########## File path: docs/content/docs/connectors/datastream/formats/mongodb.md ########## @@ -0,0 +1,33 @@ +--- +title: "MongoDb" +weight: 4 +type: docs +aliases: +- /dev/connectors/formats/mongodb.html +- /apis/streaming/connectors/formats/mongodb.html + +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# MongoDB format Review comment: Is MongoDB a format or a connector? I would expect the latter? ########## File path: docs/content/docs/connectors/datastream/formats/parquet.md ########## @@ -0,0 +1,67 @@ +--- +title: "Parquet" +weight: 4 +type: docs +aliases: +- /dev/connectors/formats/parquet.html +- /apis/streaming/connectors/formats/parquet.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + + +# Parquet formats + +Flink has extensive built-in support for [Apache Parquet](http://parquet.apache.org/). This allows to easily read from Parquet files with Flink. Review comment: ```suggestion Flink has built-in support for [Apache Parquet](http://parquet.apache.org/). This allows to read and write Parquet data with Flink. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
