dcoliversun commented on code in PR #35886: URL: https://github.com/apache/spark/pull/35886#discussion_r843447694
########## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala: ########## @@ -74,16 +74,8 @@ private[spark] class BasicDriverFeatureStep(conf: KubernetesDriverConf) private val driverMemoryWithOverheadMiB = driverMemoryMiB + memoryOverheadMiB override def configurePod(pod: SparkPod): SparkPod = { - val driverCustomEnvs = (Seq( - (ENV_APPLICATION_ID, conf.appId) - ) ++ conf.environment) - .map { env => - new EnvVarBuilder() - .withName(env._1) - .withValue(env._2) - .build() - } - + val driverCustomEnvs = KubernetesUtils.buildEnvVars( + conf.environment + (ENV_APPLICATION_ID -> conf.appId)) Review Comment: Your worries are right. After the latest commit, I don't think this refactoring will introduce lots of regressions. 1. When there are two environment variables with the same key in the pod, the last value shall work. I apply a pod with yaml ```yaml apiVersion: v1 kind: Pod metadata: name: static-web labels: role: myrole spec: containers: - name: web image: nginx env: - name: DEMO_GREETING value: "Hello from the environment" - name: DEMO_GREETING value: "Such a sweet sorrow" ports: - name: web containerPort: 80 protocol: TCP ``` And execute the echo command, get the value of `DEMO_GREETING` is `Such a sweet sorrow`. ```shell $ kubectl exec -ti static-web -n default -- /bin/bash $ echo $DEMO_GREETING Such a sweet sorrow ``` 2. `++` operation in Scala Map is overwrite append operation. As long as the order of our `++` remains the same as the previous `env` order, the value of the environment variable in the pod will not change. For example, user specified `ENV_APPLICATION_ID` as `temp_app_id` in sparkConf ```plain spark.kubernetes.driverEnv.ENV_APPLICATION_ID=user_specified_app_id ``` According to the previous logic, two envs with the same key will be generated ```yaml env: - name: ENV_APPLICATION_ID value: spark-XXXXXXX # https://github.com/apache/spark/blob/b852645f69b3b7a0a2140a732c4c03b302f8795a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala#L222 - name: ENV_APPLICATION_ID value: user_specified_app_id ``` According to the first point, the value of `ENV_APPLICATION_ID` is `user_specified_app_id`. We replace `seq` with `map`. It causes the env as follow ```yaml env: - name: ENV_APPLICATION_ID value: user_specified_app_id ``` Because `++` in `map` is overwrite operation, so the value of `ENV_APPLICATION_ID` is overwritten. Finally, the value of `ENV_APPLICATION_ID` in pod is `user_specified_app_id`. I agree that this refactoring does not bring env changes and can reduce unnecessary env. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org