Shruti Singhania created FLINK-37932:
----------------------------------------
Summary: Set a Default User Agent for GCS FileSystem Connector for
Better Observability
Key: FLINK-37932
URL: https://issues.apache.org/jira/browse/FLINK-37932
Project: Flink
Issue Type: Improvement
Components: Connectors / Hadoop Compatibility, FileSystems
Reporter: Shruti Singhania
*1. Problem Statement*
When Apache Flink interacts with Google Cloud Storage (GCS) via the
{{flink-gs-fs-hadoop}} connector, the requests made to the GCS API do not
contain any Flink-specific identifiers by default. While users can manually
configure a user agent suffix via the {{fs.gs.user.agent.suffix}} property in
{{{}core-site.xml{}}}, most users do not.
This lack of a default identifier makes it difficult for users and cloud
administrators to distinguish Flink-originated GCS traffic from other
applications in GCP Cloud Audit Logs. This complicates monitoring, debugging,
and cost attribution for Flink jobs operating in a cloud-native environment.
*2. Proposed Solution*
This proposal is to programmatically set a default user agent suffix within the
Flink GCS filesystem factory. This suffix would be added _only if_ one is not
already provided by the user in their configuration.
The proposed default user agent suffix will include the Flink version, for
example: {{{}Apache Flink 1.19.0{}}}.
This provides a "sensible default" that enhances the experience for users
running Flink on Google Cloud, while fully respecting any custom configuration.
*3. Implementation Details*
The change would be implemented in the {{flink-gs-fs-hadoop}} module.
* *Module:* {{flink-gs-fs-hadoop}}
* *Class:* {{org.apache.flink.fs.gs.GSFileSystemFactory}}
* *Method:* {{create(URI fsUri)}}
The implementation logic is as follows:
# In the {{create}} method, after loading the Hadoop configuration, check if
the {{fs.gs.user.agent.suffix}} key is already set.
# If the key is not set (i.e., its value is {{{}null{}}}), programmatically
set it on the {{Configuration}} object. The value should be dynamically
generated using {{EnvironmentInformation.getVersion()}} to ensure version
accuracy.
# If the key is already set, do nothing, preserving the user's configuration.
# Proceed with {{FileSystem}} instantiation using the (now guaranteed to be
populated) {{Configuration}} object.
*4. Impact on Users*
* This is a {*}non-breaking change{*}.
* Users who have already configured a custom {{fs.gs.user.agent.suffix}} will
see no difference in behavior.
* Users who have _not_ configured this property will automatically gain
improved observability in their GCP logs without needing to make any changes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)