davidradl commented on code in PR #27937:
URL: https://github.com/apache/flink/pull/27937#discussion_r3092340942
##########
docs/content/docs/deployment/filesystems/s3.md:
##########
@@ -64,94 +64,288 @@ env.configure(config);
Note that these examples are *not* exhaustive and you can use S3 in other
places as well, including your [high availability setup]({{< ref
"docs/deployment/ha/overview" >}}) or the [EmbeddedRocksDBStateBackend]({{< ref
"docs/ops/state/state_backends" >}}#the-rocksdbstatebackend); everywhere that
Flink expects a FileSystem URI (unless otherwise stated).
-For most use cases, you may use one of our `flink-s3-fs-hadoop` and
`flink-s3-fs-presto` S3 filesystem plugins which are self-contained and easy to
set up.
-For some cases, however, e.g., for using S3 as YARN's resource storage dir, it
may be necessary to set up a specific Hadoop S3 filesystem implementation.
+## S3 FileSystem Implementations
-### Hadoop/Presto S3 File Systems plugins
+Flink provides three independent S3 filesystem implementations, each with
different trade-offs:
+
+- **Native S3 FileSystem** (`flink-s3-fs-native`): Built directly on AWS SDK
v2 with async I/O and parallel transfers removing the dependency from hadoop
entirely. This implementation supports both checkpointing and the FileSystem
sink. The Native S3 FileSystem aims to provide integrated support for
checkpointing as well as FileSystem sink, removing the need to use Presto S3
FileSystem for checkpointing and Hadoop S3 FileSystem for the FileSystem sink.
[Benchmarks](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406620396)
show ~2x higher checkpoint throughput (~200 MB/s vs ~90 MB/s) compared to the
Presto implementation at state sizes up to 15 GB. **Experimental** in Flink 2.3.
+- **Presto S3 FileSystem** (`flink-s3-fs-presto`): Based on Presto project
code, recommended for checkpointing.
Review Comment:
I am curiou why this is recommended for checkpointing when the new native
version has checkpointing support. What are the pros and cons of implementing
checkpointing between these 2 connectors and why is presto recommended?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]