shrutisinghania commented on code in PR #52027:
URL: https://github.com/apache/spark/pull/52027#discussion_r2282186422
##########
core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala:
##########
@@ -521,6 +521,17 @@ private[spark] object SparkHadoopUtil extends Logging {
SOURCE_SPARK_HADOOP)
}
val setBySpark = SET_TO_DEFAULT_VALUES
+ // The GCS connector allows appending a custom suffix to the user-agent
string.
+ // To prepend Spark's application information, we read the existing suffix,
+ // add our prefix, and set the result back as the new suffix.
+ val sparkGcsPrefix = s"apache_spark/${org.apache.spark.SPARK_VERSION}
(GPN:apache_spark)"
Review Comment:
There is no strong opinion on string format, so changed it to a more
conventional format of "apache-spark"
The `(GPN:apache_spark)` postfix is a convention that would be used to
identify Google Cloud Storage (GCS) API requests that originate from Spark
applications.
Eg. of other ASF repositories which already has it integrated - [Apache
beam](https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/Transport.java#L102)
##########
core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala:
##########
@@ -521,6 +521,17 @@ private[spark] object SparkHadoopUtil extends Logging {
SOURCE_SPARK_HADOOP)
}
val setBySpark = SET_TO_DEFAULT_VALUES
+ // The GCS connector allows appending a custom suffix to the user-agent
string.
+ // To prepend Spark's application information, we read the existing suffix,
+ // add our prefix, and set the result back as the new suffix.
+ val sparkGcsPrefix = s"apache_spark/${org.apache.spark.SPARK_VERSION}
(GPN:apache_spark)"
+ val userGcsSuffix = hadoopConf.get("fs.gs.application.name.suffix")
Review Comment:
This is a specific property for the Google Cloud Storage (GCS) connector,
which Hadoop loads to interact with `gs://` paths. Here is the official
documentation for this property: [GCS connector
Configuration](https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/master/gcs/CONFIGURATION.md?plain=1#L386)
Ambari, the tool for managing hadoop clusters also uses this hardcoded
string/config : [ASF project
repository](https://github.com/apache/ambari/pull/2510)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]