This is an automated email from the ASF dual-hosted git repository. fpaul pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/flink.git
The following commit(s) were added to refs/heads/master by this push: new aeb3822 [FLINK-25927][docs][formats] Add DataStream documentation for CSV format aeb3822 is described below commit aeb3822ece887734dcaed5b2554f5583488d2dc0 Author: Alexander Fedulov <1492164+afedu...@users.noreply.github.com> AuthorDate: Thu Feb 24 01:34:18 2022 +0100 [FLINK-25927][docs][formats] Add DataStream documentation for CSV format --- .../docs/connectors/datastream/formats/csv.md | 60 ++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/docs/content/docs/connectors/datastream/formats/csv.md b/docs/content/docs/connectors/datastream/formats/csv.md new file mode 100644 index 0000000..15d47ed --- /dev/null +++ b/docs/content/docs/connectors/datastream/formats/csv.md @@ -0,0 +1,60 @@ +--- +title: "CSV" +weight: 4 +type: docs +aliases: +- /dev/connectors/formats/csv.html +- /apis/streaming/connectors/formats/csv.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + + +# CSV format + +To use the CSV format you need to add the Flink CSV dependency to your project: + +```xml +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-csv</artifactId> + <version>{{< version >}}</version> +</dependency> +``` + +Flink supports reading CSV files using `CsvReaderFormat`. The reader utilizes Jackson library and allows passing the corresponding configuration for the CSV schema and parsing options. + +`CsvReaderFormat` can be initialized and used like this: +```java +CsvReaderFormat<SomePojo> csvFormat = CsvReaderFormat.forPojo(SomePojo.class); +FileSource<SomePojo> source = + FileSource.forRecordStreamFormat(csvFormat, Path.fromLocalFile(...)).build(); +``` + +The schema for CSV parsing, in this case, is automatically derived based on the fields of the `SomePojo` class using the `Jackson` library. (Note: you might need to add `@JsonPropertyOrder({field1, field2, ...})` annotation to your class definition with the fields order exactly matching those of the CSV file columns). + +If you need more fine-grained control over the CSV schema or the parsing options, use the more low-level `forSchema` static factory method of `CsvReaderFormat`: + +```java +CsvReaderFormat<T> forSchema(CsvMapper mapper, + CsvSchema schema, + TypeInformation<T> typeInformation) +``` + +Similarly to the `TextLineInputFormat`, `CsvReaderFormat` can be used in both continues and batch modes (see [TextLineInputFormat]({{< ref "docs/connectors/datastream/formats/text_files" >}}) for examples).