[ 
https://issues.apache.org/jira/browse/SPARK-32067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Yu updated SPARK-32067:
-----------------------------
    Description: 
THE BUG:

The bug is reproducible by spark-submit two different apps (app1 and app2) with 
different executor pod templates (e.g., different labels) to K8s sequentially, 
and app2 launches while app1 is still ramping up all its executor pods. The 
unwanted result is that some launched executor pods of app1 appear to have 
app2's pod template applied.

The root cause is that app1's podspec-configmap got overwritten by app2 during 
the overlapping launching periods because the configmap names of the two apps 
are the same. This causes some app1's executor pods being ramped up after app2 
is launched to be inadvertently launched with the app2's pod template. The 
issue can be seen as follows:

First, after submitting app1, you get these configmaps:
{code:java}
NAMESPACE    NAME                                       DATA    AGE
default      app1-1111111111111111-driver-conf-map      1       9m46s
default      podspec-configmap                          1       12m{code}
Then submit app2 while app1 is still ramping up its executors. The 
podspec-confimap is modified by app2.
{code:java}
NAMESPACE    NAME                                       DATA    AGE
default      app1-1111111111111111-driver-conf-map      1       11m43s
default      app2-2222222222222222-driver-conf-map      1       10s
default      podspec-configmap                          1       13m57s{code}
 

PROPOSED SOLUTION:

Properly prefix the podspec-configmap for each submitted app.
{code:java}
NAMESPACE    NAME                                       DATA    AGE
default      app1-1111111111111111-driver-conf-map      1       11m43s
default      app2-2222222222222222-driver-conf-map      1       10s
default      app1-1111111111111111-podspec-configmap    1       13m57s
default      app2-2222222222222222-podspec-configmap    1       13m57s{code}

  was:
THE BUG:

The bug is reproducible by spark-submit two different apps (app1 and app2) with 
different executor pod templates (e.g., different labels) to K8s sequentially, 
and app2 launches while app1 is still ramping up all its executor pods. The 
unwanted result is that some launched executor pods of app1 appear to have 
app2's pod template applied.

The root cause is that app1's podspec-configmap got overwritten by app2 during 
the overlapping launching periods because the configmap names of the two apps 
are the same. This causes some app1's executor pods being ramped up after app2 
is launched to be inadvertently launched with the app2's pod template.

First, submit app1
{code:java}
NAMESPACE    NAME                                       DATA    AGE
default      app1-1111111111111111-driver-conf-map      1       9m46s
default      podspec-configmap                          1       12m{code}
Then submit app2 while app1 is still ramping up its executors
{code:java}
NAMESPACE    NAME                                       DATA    AGE
default      app1-1111111111111111-driver-conf-map      1       11m43s
default      app2-2222222222222222-driver-conf-map      1       10s
default      podspec-configmap                          1       13m57s{code}
 

PROPOSED SOLUTION:

Properly prefix the podspec-configmap for each submitted app.
{code:java}
NAMESPACE    NAME                                       DATA    AGE
default      app1-1111111111111111-driver-conf-map      1       11m43s
default      app2-2222222222222222-driver-conf-map      1       10s
default      app1-1111111111111111-podspec-configmap    1       13m57s
default      app2-2222222222222222-podspec-configmap    1       13m57s{code}


> [K8s] Pod template from subsequently submission inadvertently applies to 
> ongoing submission
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-32067
>                 URL: https://issues.apache.org/jira/browse/SPARK-32067
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 2.4.6, 3.0.0
>            Reporter: James Yu
>            Priority: Minor
>
> THE BUG:
> The bug is reproducible by spark-submit two different apps (app1 and app2) 
> with different executor pod templates (e.g., different labels) to K8s 
> sequentially, and app2 launches while app1 is still ramping up all its 
> executor pods. The unwanted result is that some launched executor pods of 
> app1 appear to have app2's pod template applied.
> The root cause is that app1's podspec-configmap got overwritten by app2 
> during the overlapping launching periods because the configmap names of the 
> two apps are the same. This causes some app1's executor pods being ramped up 
> after app2 is launched to be inadvertently launched with the app2's pod 
> template. The issue can be seen as follows:
> First, after submitting app1, you get these configmaps:
> {code:java}
> NAMESPACE    NAME                                       DATA    AGE
> default      app1-1111111111111111-driver-conf-map      1       9m46s
> default      podspec-configmap                          1       12m{code}
> Then submit app2 while app1 is still ramping up its executors. The 
> podspec-confimap is modified by app2.
> {code:java}
> NAMESPACE    NAME                                       DATA    AGE
> default      app1-1111111111111111-driver-conf-map      1       11m43s
> default      app2-2222222222222222-driver-conf-map      1       10s
> default      podspec-configmap                          1       13m57s{code}
>  
> PROPOSED SOLUTION:
> Properly prefix the podspec-configmap for each submitted app.
> {code:java}
> NAMESPACE    NAME                                       DATA    AGE
> default      app1-1111111111111111-driver-conf-map      1       11m43s
> default      app2-2222222222222222-driver-conf-map      1       10s
> default      app1-1111111111111111-podspec-configmap    1       13m57s
> default      app2-2222222222222222-podspec-configmap    1       13m57s{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to