Re: [PR] [FLINK-39118][docs] Add documentation for Native s3 FileSystem [flink]

via GitHub Thu, 16 Apr 2026 07:18:40 -0700


Samrat002 commented on code in PR #27937:
URL: https://github.com/apache/flink/pull/27937#discussion_r3093867384



##########
docs/content/docs/deployment/filesystems/s3.md:
##########
@@ -64,94 +64,288 @@ env.configure(config);
 
 Note that these examples are *not* exhaustive and you can use S3 in other 
places as well, including your [high availability setup]({{< ref 
"docs/deployment/ha/overview" >}}) or the [EmbeddedRocksDBStateBackend]({{< ref 
"docs/ops/state/state_backends" >}}#the-rocksdbstatebackend); everywhere that 
Flink expects a FileSystem URI (unless otherwise stated).
 
-For most use cases, you may use one of our `flink-s3-fs-hadoop` and 
`flink-s3-fs-presto` S3 filesystem plugins which are self-contained and easy to 
set up.
-For some cases, however, e.g., for using S3 as YARN's resource storage dir, it 
may be necessary to set up a specific Hadoop S3 filesystem implementation.
+## S3 FileSystem Implementations
 
-### Hadoop/Presto S3 File Systems plugins
+Flink provides three independent S3 filesystem implementations, each with 
different trade-offs:
+
+- **Native S3 FileSystem** (`flink-s3-fs-native`): Built directly on AWS SDK 
v2 with async I/O and parallel transfers removing the dependency from hadoop 
entirely. This implementation supports both checkpointing and the FileSystem 
sink. The Native S3 FileSystem aims to provide integrated support for 
checkpointing as well as FileSystem sink, removing the need to use Presto S3 
FileSystem for checkpointing and Hadoop S3 FileSystem for the FileSystem sink. 
[Benchmarks](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406620396)
 show ~2x higher checkpoint throughput (~200 MB/s vs ~90 MB/s) compared to the 
Presto implementation at state sizes up to 15 GB. **Experimental** in Flink 2.3.

Review Comment:
   I've consolidated those two sentences into one.
   "checkpointing" and "FileSink" refer to two distinct ways Flink uses the 
filesystem, not two types of sinks. 
   
   Historically, the Presto plugin was recommended for checkpointing (better 
streaming write performance) while the Hadoop plugin was needed for FileSink 
(it implements RecoverableWriter). The Native S3 implementation supports both 
in a single plugin.
   
   This is not about sink types. it's about which S3 plugin you need for which 
Flink feature.
   
   I've updated the text to make this clearer:
   
   > "Supports both checkpointing and the FileSink in a single plugin, removing 
the need to choose between 
   > Presto (checkpointing) and Hadoop (FileSink)."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-39118][docs] Add documentation for Native s3 FileSystem [flink]

Reply via email to