shrutisinghania commented on PR #52027: URL: https://github.com/apache/spark/pull/52027#issuecomment-3234065578
> First, this could be a regression for non-GoogleCloud users because it injects a new configuration redundantly and which will not be used at any chance. While it's true that this change adds a new configuration to the Hadoop configuration for all users, it is only read and used by the GCS connector. For other storage systems like HDFS or S3, this configuration will simply be ignored. Therefore, we believe the risk of any regression is minimal, as it's a harmless, unused property for non-GCS users. > Second, GCS users can add this configuration manually if this is really useful for them. While it is true that users can add this configuration manually, there are several benefits to making this the default behavior in Spark: - Improved Observability by Default: The fs.gs.application.name.suffix property is very useful for identifying which application is performing GCS operations, especially in environments where multiple Spark applications interact with GCS concurrently. Many users may not be aware of this specific GCS property, so this change provides them with valuable diagnostic information out-of-the-box. - User Convenience: The implementation is designed to be convenient. If a user has already set a custom suffix, the Spark identifier is simply prepended to their existing value, so they don't lose their custom identifiers and don't need to take any action to get the benefit of the Spark-specific tag. - Standardization and Consistency: By having Spark automatically prepend a standard identifier, we ensure that all Spark jobs are uniformly identifiable in GCS logs and metrics. This simplifies monitoring and debugging across an organization, as it doesn't rely on users remembering to manually configure this property. Given that the change is harmless for non-GCS users, the advantages of providing consistent, out-of-the-box observability for GCS users make this a worthwhile improvement. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
