shrutisinghania commented on code in PR #52027:
URL: https://github.com/apache/spark/pull/52027#discussion_r2282186422


##########
core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala:
##########
@@ -521,6 +521,17 @@ private[spark] object SparkHadoopUtil extends Logging {
         SOURCE_SPARK_HADOOP)
     }
     val setBySpark = SET_TO_DEFAULT_VALUES
+    // The GCS connector allows appending a custom suffix to the user-agent 
string.
+    // To prepend Spark's application information, we read the existing suffix,
+    // add our prefix, and set the result back as the new suffix.
+    val sparkGcsPrefix = s"apache_spark/${org.apache.spark.SPARK_VERSION} 
(GPN:apache_spark)"

Review Comment:
   There is no strong opinion on string format, so changed it to a more 
conventional format of "apache-spark"
   
   The `(GPN:apache_spark)` postfix is a convention that would be used to 
identify Google Cloud Storage (GCS) API requests that originate from Spark 
applications.
   Eg. of other ASF repositories which already has it integrated - [Apache 
beam](https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/Transport.java#L102)



##########
core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala:
##########
@@ -521,6 +521,17 @@ private[spark] object SparkHadoopUtil extends Logging {
         SOURCE_SPARK_HADOOP)
     }
     val setBySpark = SET_TO_DEFAULT_VALUES
+    // The GCS connector allows appending a custom suffix to the user-agent 
string.
+    // To prepend Spark's application information, we read the existing suffix,
+    // add our prefix, and set the result back as the new suffix.
+    val sparkGcsPrefix = s"apache_spark/${org.apache.spark.SPARK_VERSION} 
(GPN:apache_spark)"
+    val userGcsSuffix = hadoopConf.get("fs.gs.application.name.suffix")

Review Comment:
   This is a specific property for the Google Cloud Storage (GCS) connector, 
which Hadoop loads to interact with `gs://` paths. Here is the official 
documentation for this property: [GCS connector 
Configuration](https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/master/gcs/CONFIGURATION.md?plain=1#L386)
   Ambari, the tool for managing hadoop clusters also uses this hardcoded 
string/config : [ASF project 
repository](https://github.com/apache/ambari/pull/2510)  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to