fapaul commented on a change in pull request #18288: URL: https://github.com/apache/flink/pull/18288#discussion_r786117367
########## File path: docs/content/docs/connectors/datastream/filesystem.md ########## @@ -25,12 +27,227 @@ specific language governing permissions and limitations under the License. --> -# File Sink +# FileSystem -This connector provides a unified Sink for `BATCH` and `STREAMING` that writes partitioned files to filesystems +This connector provides a unified Source and Sink for `BATCH` and `STREAMING` that reads or writes (partitioned) files to filesystems supported by the [Flink `FileSystem` abstraction]({{< ref "docs/deployment/filesystems/overview" >}}). This filesystem -connector provides the same guarantees for both `BATCH` and `STREAMING` and it is an evolution of the -existing [Streaming File Sink]({{< ref "docs/connectors/datastream/streamfile_sink" >}}) which was designed for providing exactly-once semantics for `STREAMING` execution. +connector provides the same guarantees for both `BATCH` and `STREAMING` and is designed for providing exactly-once semantics for `STREAMING` execution. + +The connector supports reading and writing a set of files from any (distributed) file system (e.g. POSIX, S3, HDFS) +with a [format]({{< ref "docs/connectors/datastream/formats/overview" >}}) (e.g., Avro, CSV, Parquet), +producing a stream or records. + +## File Source + +The `File Source` is based on the [Source API]({{< ref "docs/dev/datastream/sources" >}}#the-data-source-api), +a unified data source that reads files - both in batch and in streaming mode. +It is divided into the following two parts: File SplitEnumerator and File SourceReader. Review comment: I guess we are mixing various technical concepts here. The interface to build generally applicable enumerators for the FLIP-27 source API is called `SplitEnumerator` and we have another abstraction that is the `FileEnumerator`. Probably `FileEnumerator` is not a good name and it should be more like `FileEnumerationStrategy`. I agree only seeing both terms is very confusing. I tend to use `SplitEnumerator` where ever possible and maybe have a special section about customizing the file enumeration strategy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org