[GitHub] [spark] dcoliversun commented on a diff in pull request #35886: [SPARK-38582][K8S] Introduce buildEnvVars and buildEnvVarsWithFieldRef for KubernetesUtils to eliminate duplicate code pattern

GitBox Tue, 05 Apr 2022 21:25:12 -0700


dcoliversun commented on code in PR #35886:
URL: https://github.com/apache/spark/pull/35886#discussion_r843447694



##########
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala:
##########
@@ -74,16 +74,8 @@ private[spark] class BasicDriverFeatureStep(conf: 
KubernetesDriverConf)
   private val driverMemoryWithOverheadMiB = driverMemoryMiB + memoryOverheadMiB
 
   override def configurePod(pod: SparkPod): SparkPod = {
-    val driverCustomEnvs = (Seq(
-      (ENV_APPLICATION_ID, conf.appId)
-    ) ++ conf.environment)
-      .map { env =>
-        new EnvVarBuilder()
-          .withName(env._1)
-          .withValue(env._2)
-          .build()
-      }
-
+    val driverCustomEnvs = KubernetesUtils.buildEnvVars(
+      conf.environment + (ENV_APPLICATION_ID -> conf.appId))

Review Comment:
   Your worries are right. After the latest commit, I don't think this 
refactoring will introduce lots of regressions.
   1. When there are two environment variables with the same key in the pod, 
the last value shall work.
   
   I apply a pod with yaml
   ```yaml
   apiVersion: v1
   kind: Pod
   metadata:
     name: static-web
     labels:
       role: myrole
   spec:
     containers:
       - name: web
         image: nginx
         env:
         - name: DEMO_GREETING
           value: "Hello from the environment"
         - name: DEMO_GREETING
           value: "Such a sweet sorrow"
         ports:
           - name: web
             containerPort: 80
             protocol: TCP
   ```
   And execute the echo command, get the value of `DEMO_GREETING` is `Such a 
sweet sorrow`.
   ```shell
   $ kubectl exec -ti static-web -n default -- /bin/bash
   $ echo $DEMO_GREETING
   Such a sweet sorrow
   ```
   2. `++` operation in Scala Map is overwrite append operation. As long as the 
order of our `++` remains the same as the previous `env` order, the value of 
the environment variable in the pod will not change. For example, user 
specified `ENV_APPLICATION_ID` as `temp_app_id` in sparkConf
   ```plain
   spark.kubernetes.driverEnv.ENV_APPLICATION_ID=user_specified_app_id
   ```
   According to the previous logic, two envs with the same key will be generated
   ```yaml
   env:
     - name: ENV_APPLICATION_ID
       value: spark-XXXXXXX # 
https://github.com/apache/spark/blob/b852645f69b3b7a0a2140a732c4c03b302f8795a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala#L222
     - name: ENV_APPLICATION_ID
        value: user_specified_app_id
   ```
   According to the first point, the value of `ENV_APPLICATION_ID` is 
`user_specified_app_id`.
   
   We replace `seq` with `map`. It causes the env as follow
   ```yaml
   env:
     - name: ENV_APPLICATION_ID
        value: user_specified_app_id
   ```
   Because `++` in `map` is overwrite operation, so the value of 
`ENV_APPLICATION_ID` is overwritten. Finally, the value of `ENV_APPLICATION_ID` 
in pod is `user_specified_app_id`.
   
   I agree that this refactoring does not bring env changes and can reduce 
unnecessary env.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dcoliversun commented on a diff in pull request #35886: [SPARK-38582][K8S] Introduce buildEnvVars and buildEnvVarsWithFieldRef for KubernetesUtils to eliminate duplicate code pattern

Reply via email to