Re: [PR] [FLINK-39118][docs] Add documentation for Native s3 FileSystem [flink]

via GitHub Fri, 17 Apr 2026 00:58:03 -0700


davidradl commented on code in PR #27937:
URL: https://github.com/apache/flink/pull/27937#discussion_r3098713035



##########
docs/content/docs/deployment/filesystems/s3.md:
##########
@@ -64,94 +64,208 @@ env.configure(config);
 
 Note that these examples are *not* exhaustive and you can use S3 in other 
places as well, including your [high availability setup]({{< ref 
"docs/deployment/ha/overview" >}}) or the [EmbeddedRocksDBStateBackend]({{< ref 
"docs/ops/state/state_backends" >}}#the-rocksdbstatebackend); everywhere that 
Flink expects a FileSystem URI (unless otherwise stated).
 
-For most use cases, you may use one of our `flink-s3-fs-hadoop` and 
`flink-s3-fs-presto` S3 filesystem plugins which are self-contained and easy to 
set up.
-For some cases, however, e.g., for using S3 as YARN's resource storage dir, it 
may be necessary to set up a specific Hadoop S3 filesystem implementation.
+## S3 FileSystem Implementations
 
-### Hadoop/Presto S3 File Systems plugins
+Flink provides three independent S3 filesystem implementations:
 
-{{< hint info >}}
-You don't have to configure this manually if you are running [Flink on 
EMR](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-flink.html).
-{{< /hint >}}
+| Implementation | Checkpointing | FileSink | Notes |
+|---------------|:---:|:---:|-------|
+| **Native S3** (`flink-s3-fs-native`) | ✓ | ✓ | **Experimental** in Flink 
2.3. Built on AWS SDK v2; no Hadoop dependency. |
+| **Presto S3** (`flink-s3-fs-presto`) | ✓ | x | Production-proven for 
checkpointing. |
+| **Hadoop S3** (`flink-s3-fs-hadoop`) | ✓ | ✓ | Mature; the only stable 
implementation that provides `RecoverableWriter` for the FileSink. |
 
-Flink provides two file systems to talk to Amazon S3, `flink-s3-fs-presto` and 
`flink-s3-fs-hadoop`.
-Both implementations are self-contained with no dependency footprint, so there 
is no need to add Hadoop to the classpath to use them.
+Previously, users had to choose between Presto (recommended for checkpointing 
throughput) and Hadoop (the only implementation with `RecoverableWriter`, 
required by the [FileSink]({{< ref "docs/connectors/datastream/filesystem" 
>}})). The Native S3 implementation unifies both capabilities in a single 
plugin and 
[benchmarks](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406620396)
 show ~2x higher checkpoint throughput (~200 MB/s vs ~90 MB/s) compared to 
Presto at state sizes up to 15 GB.

Review Comment:
   I suggest if mentioning benchmarks you need to be careful to say this is not 
the truth in all situations . We could say something like we have shown in our 
environment ... To save embarrassment if a particular scenario is non 
performant.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-39118][docs] Add documentation for Native s3 FileSystem [flink]

Reply via email to