gaborgsomogyi commented on code in PR #27841:
URL: https://github.com/apache/flink/pull/27841#discussion_r3091376098
##########
docs/content.zh/docs/deployment/filesystems/s3.md:
##########
@@ -50,127 +48,321 @@ env.fromSource(
"s3-input"
);
-// 写入 S3 bucket
+// Write to S3 bucket
stream.sinkTo(
- FileSink.forRowFormat(
- new Path("s3://<bucket>/<endpoint>"), new SimpleStringEncoder<>()
- ).build()
+ FileSink.forRowFormat(
+ new Path("s3://<bucket>/<endpoint>"), new SimpleStringEncoder<>()
+ ).build()
);
-
-// 使用 S3 作为 checkpoint storage
+// Use S3 as checkpoint storage
Configuration config = new Configuration();
config.set(CheckpointingOptions.CHECKPOINT_STORAGE, "filesystem");
config.set(CheckpointingOptions.CHECKPOINTS_DIRECTORY,
"s3://<your-bucket>/<endpoint>");
env.configure(config);
```
-注意这些例子并*不详尽*,S3 同样可以用在其他场景,包括 [JobManager 高可用配置]({{< ref
"docs/deployment/ha/overview" >}}) 或 [RocksDBStateBackend]({{< ref
"docs/ops/state/state_backends" >}}#the-rocksdbstatebackend),以及所有 Flink
需要使用文件系统 URI 的位置。
+Note that these examples are *not* exhaustive and you can use S3 in other
places as well, including your [high availability setup]({{< ref
"docs/deployment/ha/overview" >}}) or the [EmbeddedRocksDBStateBackend]({{< ref
"docs/ops/state/state_backends" >}}#the-rocksdbstatebackend); everywhere that
Flink expects a FileSystem URI (unless otherwise stated).
+
+## S3 FileSystem Implementations
+
+Flink provides three independent S3 filesystem implementations, each with
different trade-offs:
+
+- **Native S3 FileSystem** (`flink-s3-fs-native`): Built directly on AWS SDK
v2 with async I/O and parallel transfers removing the dependency from hadoop
entirely. This implementation supports both checkpointing and the FileSystem
sink. The Native S3 FileSystem aims to provide integrated support for
checkpointing as well as FileSystem sink, removing the need to use Presto S3
FileSystem for checkpointing and Hadoop S3 FileSystem for the FileSystem sink.
[Benchmarks](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406620396)
show ~2x higher checkpoint throughput (~200 MB/s vs ~90 MB/s) compared to the
Presto implementation at state sizes up to 15 GB. **Experimental** in Flink 2.3.
+- **Presto S3 FileSystem** (`flink-s3-fs-presto`): Based on Presto project
code, recommended for checkpointing.
+- **Hadoop S3 FileSystem** (`flink-s3-fs-hadoop`): Based on Hadoop project
code, has FileSystem sink support.
+
+All three are self-contained with no dependency footprint, so there is no need
to add Hadoop to the classpath to use them.
+
+## Common Configuration
+
+### Configure Access Credentials
+
+After setting up the S3 FileSystem implementation, you need to make sure that
Flink is allowed to access your S3 buckets.
+
+#### Identity and Access Management (IAM) (Recommended)
+
+The recommended way of setting up credentials on AWS is via [Identity and
Access Management
(IAM)](http://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html). You
can use IAM features to securely give Flink instances the credentials that they
need to access S3 buckets. Details about how to do this are beyond the scope of
this documentation. Please refer to the AWS user guide. What you are looking
for are [IAM
Roles](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html).
+
+If you set this up correctly, you can manage access to S3 within AWS and don't
need to distribute any access keys to Flink.
+
+#### Delegation Tokens
+
+[Delegation tokens]({{< ref
"docs/deployment/security/security-delegation-token" >}}) provide time-bounded,
automatically negotiated credentials. They are more secure than static access
keys since tokens are temporary and don't require distributing long-lived
secrets. Configure the credentials provider for your S3 implementation:
+
+```yaml
+# For Native S3 implementation
+fs.s3.aws.credentials.provider:
org.apache.flink.fs.s3native.token.DynamicTemporaryAWSCredentialsProvider
+# For Hadoop implementation
+fs.s3a.aws.credentials.provider:
org.apache.flink.fs.s3.common.token.DynamicTemporaryAWSCredentialsProvider
+# For Presto implementation
+presto.s3.credentials-provider:
org.apache.flink.fs.s3.common.token.DynamicTemporaryAWSCredentialsProvider
+```
Review Comment:
I think this alone is not giving any credentials but long lived credentials
must be set.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]