Re: [PR] [FLINK-39118][docs] Add documentation for Native s3 FileSystem [flink]

via GitHub Thu, 16 Apr 2026 03:08:08 -0700


davidradl commented on code in PR #27937:
URL: https://github.com/apache/flink/pull/27937#discussion_r3092385061



##########
docs/content/docs/deployment/filesystems/s3.md:
##########
@@ -64,94 +64,288 @@ env.configure(config);
 
 Note that these examples are *not* exhaustive and you can use S3 in other 
places as well, including your [high availability setup]({{< ref 
"docs/deployment/ha/overview" >}}) or the [EmbeddedRocksDBStateBackend]({{< ref 
"docs/ops/state/state_backends" >}}#the-rocksdbstatebackend); everywhere that 
Flink expects a FileSystem URI (unless otherwise stated).
 
-For most use cases, you may use one of our `flink-s3-fs-hadoop` and 
`flink-s3-fs-presto` S3 filesystem plugins which are self-contained and easy to 
set up.
-For some cases, however, e.g., for using S3 as YARN's resource storage dir, it 
may be necessary to set up a specific Hadoop S3 filesystem implementation.
+## S3 FileSystem Implementations
 
-### Hadoop/Presto S3 File Systems plugins
+Flink provides three independent S3 filesystem implementations, each with 
different trade-offs:
+
+- **Native S3 FileSystem** (`flink-s3-fs-native`): Built directly on AWS SDK 
v2 with async I/O and parallel transfers removing the dependency from hadoop 
entirely. This implementation supports both checkpointing and the FileSystem 
sink. The Native S3 FileSystem aims to provide integrated support for 
checkpointing as well as FileSystem sink, removing the need to use Presto S3 
FileSystem for checkpointing and Hadoop S3 FileSystem for the FileSystem sink. 
[Benchmarks](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406620396)
 show ~2x higher checkpoint throughput (~200 MB/s vs ~90 MB/s) compared to the 
Presto implementation at state sizes up to 15 GB. **Experimental** in Flink 2.3.
+- **Presto S3 FileSystem** (`flink-s3-fs-presto`): Based on Presto project 
code, recommended for checkpointing.
+- **Hadoop S3 FileSystem** (`flink-s3-fs-hadoop`): Based on Hadoop project 
code, has FileSystem sink support.
+
+All three are self-contained with no dependency footprint, so there is no need 
to add Hadoop to the classpath to use them.
+
+## Common Configuration
+
+### Configure Access Credentials
+
+After setting up the S3 FileSystem implementation, you need to make sure that 
Flink is allowed to access your S3 buckets.
+
+#### Identity and Access Management (IAM) (Recommended)
+
+The recommended way of setting up credentials on AWS is via [Identity and 
Access Management 
(IAM)](http://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html). You 
can use IAM features to securely give Flink instances the credentials that they 
need to access S3 buckets. Details about how to do this are beyond the scope of 
this documentation. Please refer to the AWS user guide. What you are looking 
for are [IAM 
Roles](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html).
+
+If you set this up correctly, you can manage access to S3 within AWS and don't 
need to distribute any access keys to Flink.
+
+#### Delegation Tokens
+
+[Delegation tokens]({{< ref 
"docs/deployment/security/security-delegation-token" >}}) provide time-bounded, 
automatically negotiated credentials. The JobManager uses long-lived 
credentials to call AWS STS and obtain short-lived session tokens, which are 
then automatically distributed to TaskManagers.

Review Comment:
   I think that it would be worth while detailing what the long lived tokens 
are, I assume these are the  access-key and  secret-key. And I assume the 
delegation tokens are the short lived token. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-39118][docs] Add documentation for Native s3 FileSystem [flink]

Reply via email to