[ https://issues.apache.org/jira/browse/SPARK-32067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208224#comment-17208224 ]
James Yu edited comment on SPARK-32067 at 10/5/20, 6:29 PM: ------------------------------------------------------------ Hey, [~dongjoon] , I noticed that you added 3.1.0 into the `Affects Version/s` of this JIRA, But at this point, 3.1.0 is not released yet. Did you mean to set the `Fix Version/s` to be 3.1.0, and it was just a typo? Or did you expect that this fix will not go into 3.1.0 so the bug will still affect 3.1.0? I hope this bug can be fixed and release as early as possible; otherwise, like [~sdehaes] said above, the pod template feature is useless to us. was (Author: james...@ymail.com): Hey, [~dongjoon] , I noticed that you added 3.1.0 into the `Affects Version/s` of this JIRA, But at this point, 3.1.0 is not released yet. Did you mean to set the `Fix Version/s` to be 3.1.0, and it was just a typo? Or did you expect that this fix will not go into 3.1.0? I hope this bug can be fixed and release as early as possible; otherwise, like [~sdehaes] said above, the pod template feature is useless to us. > Use unique ConfigMap name for executor pod template > --------------------------------------------------- > > Key: SPARK-32067 > URL: https://issues.apache.org/jira/browse/SPARK-32067 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes > Affects Versions: 2.4.7, 3.0.1, 3.1.0 > Reporter: James Yu > Priority: Major > > THE BUG: > The bug is reproducible by spark-submit two different apps (app1 and app2) > with different executor pod templates (e.g., different labels) to K8s > sequentially, with app2 launching while app1 is still in the middle of > ramping up all its executor pods. The unwanted result is that some launched > executor pods of app1 end up having app2's executor pod template applied to > them. > The root cause appears to be that app1's podspec-configmap got overwritten by > app2 during the overlapping launching periods because both apps use the same > ConfigMap (name). This causes some app1's executor pods being ramped up after > app2 is launched to be inadvertently launched with the app2's pod template. > The issue can be seen as follows: > First, after submitting app1, you get these configmaps: > {code:java} > NAMESPACE NAME DATA AGE > default app1-1111111111111111-driver-conf-map 1 9m46s > default podspec-configmap 1 12m{code} > Then submit app2 while app1 is still ramping up its executors. The > podspec-confimap is modified by app2. > {code:java} > NAMESPACE NAME DATA AGE > default app1-1111111111111111-driver-conf-map 1 11m43s > default app2-2222222222222222-driver-conf-map 1 10s > default podspec-configmap 1 13m57s{code} > > PROPOSED SOLUTION: > Properly prefix the podspec-configmap for each submitted app, ideally the > same way as the driver configmap: > {code:java} > NAMESPACE NAME DATA AGE > default app1-1111111111111111-driver-conf-map 1 11m43s > default app1-1111111111111111-podspec-configmap 1 13m57s > default app2-2222222222222222-driver-conf-map 1 10s > default app2-2222222222222222-podspec-configmap 1 3m{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org