[ 
https://issues.apache.org/jira/browse/SPARK-32067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208224#comment-17208224
 ] 

James Yu edited comment on SPARK-32067 at 10/5/20, 6:29 PM:
------------------------------------------------------------

Hey, [~dongjoon] , I noticed that you added 3.1.0 into the `Affects Version/s` 
of this JIRA, But at this point, 3.1.0 is not released yet.  Did you mean to 
set the `Fix Version/s` to be 3.1.0, and it was just a typo? Or did you expect 
that this fix will not go into 3.1.0 so the bug will still affect 3.1.0? I hope 
this bug can be fixed and release as early as possible; otherwise, like 
[~sdehaes] said above, the pod template feature is useless to us.


was (Author: james...@ymail.com):
Hey, [~dongjoon] , I noticed that you added 3.1.0 into the `Affects Version/s` 
of this JIRA, But at this point, 3.1.0 is not released yet.  Did you mean to 
set the `Fix Version/s` to be 3.1.0, and it was just a typo? Or did you expect 
that this fix will not go into 3.1.0? I hope this bug can be fixed and release 
as early as possible; otherwise, like [~sdehaes] said above, the pod template 
feature is useless to us.

> Use unique ConfigMap name for executor pod template
> ---------------------------------------------------
>
>                 Key: SPARK-32067
>                 URL: https://issues.apache.org/jira/browse/SPARK-32067
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Kubernetes
>    Affects Versions: 2.4.7, 3.0.1, 3.1.0
>            Reporter: James Yu
>            Priority: Major
>
> THE BUG:
> The bug is reproducible by spark-submit two different apps (app1 and app2) 
> with different executor pod templates (e.g., different labels) to K8s 
> sequentially,  with app2 launching while app1 is still in the middle of 
> ramping up all its executor pods. The unwanted result is that some launched 
> executor pods of app1 end up having app2's executor pod template applied to 
> them.
> The root cause appears to be that app1's podspec-configmap got overwritten by 
> app2 during the overlapping launching periods because both apps use the same 
> ConfigMap (name). This causes some app1's executor pods being ramped up after 
> app2 is launched to be inadvertently launched with the app2's pod template. 
> The issue can be seen as follows:
> First, after submitting app1, you get these configmaps:
> {code:java}
> NAMESPACE    NAME                                       DATA    AGE
> default      app1-1111111111111111-driver-conf-map      1       9m46s
> default      podspec-configmap                          1       12m{code}
> Then submit app2 while app1 is still ramping up its executors. The 
> podspec-confimap is modified by app2.
> {code:java}
> NAMESPACE    NAME                                       DATA    AGE
> default      app1-1111111111111111-driver-conf-map      1       11m43s
> default      app2-2222222222222222-driver-conf-map      1       10s
> default      podspec-configmap                          1       13m57s{code}
>  
> PROPOSED SOLUTION:
> Properly prefix the podspec-configmap for each submitted app, ideally the 
> same way as the driver configmap:
> {code:java}
> NAMESPACE    NAME                                       DATA    AGE
> default      app1-1111111111111111-driver-conf-map      1       11m43s
> default      app1-1111111111111111-podspec-configmap    1       13m57s
> default      app2-2222222222222222-driver-conf-map      1       10s 
> default      app2-2222222222222222-podspec-configmap    1       3m{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to