[ 
https://issues.apache.org/jira/browse/SPARK-27574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16829292#comment-16829292
 ] 

Udbhav Agrawal commented on SPARK-27574:
----------------------------------------

Hey [~zyfo2] can you share driver pod logs obtained by :

kubectl logs driverpod-name -n namespace-name

> spark on kubernetes driver pod phase changed from running to pending and 
> starts another container in pod
> --------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-27574
>                 URL: https://issues.apache.org/jira/browse/SPARK-27574
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 2.4.0
>         Environment: Kubernetes version (use kubectl version):
> v1.10.0
> OS (e.g: cat /etc/os-release):
> CentOS-7
> Kernel (e.g. uname -a):
> 4.17.11-1.el7.elrepo.x86_64
> Spark-2.4.0
>            Reporter: Will Zhang
>            Priority: Major
>
> I'm using spark-on-kubernetes to submit spark app to kubernetes.
> most of the time, it runs smoothly.
> but sometimes, I see logs after submitting: the driver pod phase changed from 
> running to pending and starts another container in the pod though the first 
> container exited successfully.
> I use the standard spark-submit to kubernetes like:
> /opt/spark/spark-2.4.0-bin-hadoop2.7/bin/spark-submit --deploy-mode cluster 
> --class xxx ...
>  
> log is below:
>  
>  
> 2019-04-25 13:37:01 INFO LoggingPodStatusWatcherImpl:54 - State changed, new 
> state:
> pod name: com-xxxx-cloud-mf-trainer-submit-1556199419847-driver
> namespace: default
> labels: DagTask_ID -> 5fd12b90-fbbb-41f0-41ad-7bc5bd0abfe0, 
> spark-app-selector -> spark-3c8350a62ab44c139ce073d654fddebb, spark-role -> 
> driver
> pod uid: 348cdcf5-675f-11e9-ae72-e8611f1fbb2a
> creation time: 2019-04-25T13:37:01Z
> service account name: default
> volumes: spark-local-dir-1, spark-conf-volume, default-token-q7drh
> node name: N/A
> start time: N/A
> container images: N/A
> phase: Pending
> status: []
> 2019-04-25 13:37:01 INFO LoggingPodStatusWatcherImpl:54 - State changed, new 
> state:
> pod name: com-xxxx-cloud-mf-trainer-submit-1556199419847-driver
> namespace: default
> labels: DagTask_ID -> 5fd12b90-fbbb-41f0-41ad-7bc5bd0abfe0, 
> spark-app-selector -> spark-3c8350a62ab44c139ce073d654fddebb, spark-role -> 
> driver
> pod uid: 348cdcf5-675f-11e9-ae72-e8611f1fbb2a
> creation time: 2019-04-25T13:37:01Z
> service account name: default
> volumes: spark-local-dir-1, spark-conf-volume, default-token-q7drh
> node name: yq01-m12-ai2b-service02.yq01.xxxx.com
> start time: N/A
> container images: N/A
> phase: Pending
> status: []
> 2019-04-25 13:37:01 INFO Client:54 - Waiting for application 
> com.xxxx.cloud.mf.trainer.Submit to finish...
> 2019-04-25 13:37:01 INFO LoggingPodStatusWatcherImpl:54 - State changed, new 
> state:
> pod name: com-xxxx-cloud-mf-trainer-submit-1556199419847-driver
> namespace: default
> labels: DagTask_ID -> 5fd12b90-fbbb-41f0-41ad-7bc5bd0abfe0, 
> spark-app-selector -> spark-3c8350a62ab44c139ce073d654fddebb, spark-role -> 
> driver
> pod uid: 348cdcf5-675f-11e9-ae72-e8611f1fbb2a
> creation time: 2019-04-25T13:37:01Z
> service account name: default
> volumes: spark-local-dir-1, spark-conf-volume, default-token-q7drh
> node name: yq01-m12-ai2b-service02.yq01.xxxx.com
> start time: 2019-04-25T13:37:01Z
> container images: 10.96.0.100:5000/spark:spark-2.4.0
> phase: Pending
> status: [ContainerStatus(containerID=null, 
> image=10.96.0.100:5000/spark:spark-2.4.0, imageID=, 
> lastState=ContainerState(running=null, terminated=null, waiting=null, 
> additionalProperties={}), name=spark-kubernetes-driver, ready=false, 
> restartCount=0, state=ContainerState(running=null, terminated=null, 
> waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, 
> additionalProperties={}), additionalProperties={}), additionalProperties={})]
> 2019-04-25 13:37:04 INFO LoggingPodStatusWatcherImpl:54 - State changed, new 
> state:
> pod name: com-xxxx-cloud-mf-trainer-submit-1556199419847-driver
> namespace: default
> labels: DagTask_ID -> 5fd12b90-fbbb-41f0-41ad-7bc5bd0abfe0, 
> spark-app-selector -> spark-3c8350a62ab44c139ce073d654fddebb, spark-role -> 
> driver
> pod uid: 348cdcf5-675f-11e9-ae72-e8611f1fbb2a
> creation time: 2019-04-25T13:37:01Z
> service account name: default
> volumes: spark-local-dir-1, spark-conf-volume, default-token-q7drh
> node name: yq01-m12-ai2b-service02.yq01.xxxx.com
> start time: 2019-04-25T13:37:01Z
> container images: 10.96.0.100:5000/spark:spark-2.4.0
> phase: Running
> status: 
> [ContainerStatus(containerID=docker://120dbf8cb11cf8ef9b26cff3354e096a979beb35279de34be64b3c06e896b991,
>  image=10.96.0.100:5000/spark:spark-2.4.0, 
> imageID=docker-pullable://10.96.0.100:5000/spark@sha256:5b47e2a29aeb1c644fc3853933be2ad08f9cd233dec0977908803e9a1f870b0f,
>  lastState=ContainerState(running=null, terminated=null, waiting=null, 
> additionalProperties={}), name=spark-kubernetes-driver, ready=true, 
> restartCount=0, 
> state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2019-04-25T13:37:03Z,
>  additionalProperties={}), additionalProperties={}), terminated=null, 
> waiting=null, additionalProperties={}), additionalProperties={})]
> 2019-04-25 13:37:27 INFO LoggingPodStatusWatcherImpl:54 - State changed, new 
> state:
> pod name: com-xxxx-cloud-mf-trainer-submit-1556199419847-driver
> namespace: default
> labels: DagTask_ID -> 5fd12b90-fbbb-41f0-41ad-7bc5bd0abfe0, 
> spark-app-selector -> spark-3c8350a62ab44c139ce073d654fddebb, spark-role -> 
> driver
> pod uid: 348cdcf5-675f-11e9-ae72-e8611f1fbb2a
> creation time: 2019-04-25T13:37:01Z
> service account name: default
> volumes: spark-local-dir-1, spark-conf-volume, default-token-q7drh
> node name: yq01-m12-ai2b-service02.yq01.xxxx.com
> start time: 2019-04-25T13:37:01Z
> container images: 10.96.0.100:5000/spark:spark-2.4.0
> phase: Pending
> status: [ContainerStatus(containerID=null, 
> image=10.96.0.100:5000/spark:spark-2.4.0, imageID=, 
> lastState=ContainerState(running=null, terminated=null, waiting=null, 
> additionalProperties={}), name=spark-kubernetes-driver, ready=false, 
> restartCount=0, state=ContainerState(running=null, terminated=null, 
> waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, 
> additionalProperties={}), additionalProperties={}), additionalProperties={})]
> 2019-04-25 13:37:29 INFO LoggingPodStatusWatcherImpl:54 - State changed, new 
> state:
> pod name: com-xxxx-cloud-mf-trainer-submit-1556199419847-driver
> namespace: default
> labels: DagTask_ID -> 5fd12b90-fbbb-41f0-41ad-7bc5bd0abfe0, 
> spark-app-selector -> spark-3c8350a62ab44c139ce073d654fddebb, spark-role -> 
> driver
> pod uid: 348cdcf5-675f-11e9-ae72-e8611f1fbb2a
> creation time: 2019-04-25T13:37:01Z
> service account name: default
> volumes: spark-local-dir-1, spark-conf-volume, default-token-q7drh
> node name: yq01-m12-ai2b-service02.yq01.xxxx.com
> start time: 2019-04-25T13:37:01Z
> container images: 10.96.0.100:5000/spark:spark-2.4.0
> phase: Running
> status: 
> [ContainerStatus(containerID=docker://43753f5336c41eaec8cdcdfd271b34ac465de331aad2d612fe0c7ad1c3706aac,
>  image=10.96.0.100:5000/spark:spark-2.4.0, 
> imageID=docker-pullable://10.96.0.100:5000/spark@sha256:5b47e2a29aeb1c644fc3853933be2ad08f9cd233dec0977908803e9a1f870b0f,
>  lastState=ContainerState(running=null, terminated=null, waiting=null, 
> additionalProperties={}), name=spark-kubernetes-driver, ready=true, 
> restartCount=0, 
> state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2019-04-25T13:37:28Z,
>  additionalProperties={}), additionalProperties={}), terminated=null, 
> waiting=null, additionalProperties={}), additionalProperties={})]
> 2019-04-25 13:37:52 INFO LoggingPodStatusWatcherImpl:54 - State changed, new 
> state:
> pod name: com-xxxx-cloud-mf-trainer-submit-1556199419847-driver
> namespace: default
> labels: DagTask_ID -> 5fd12b90-fbbb-41f0-41ad-7bc5bd0abfe0, 
> spark-app-selector -> spark-3c8350a62ab44c139ce073d654fddebb, spark-role -> 
> driver
> pod uid: 348cdcf5-675f-11e9-ae72-e8611f1fbb2a
> creation time: 2019-04-25T13:37:01Z
> service account name: default
> volumes: spark-local-dir-1, spark-conf-volume, default-token-q7drh
> node name: yq01-m12-ai2b-service02.yq01.xxxx.com
> start time: 2019-04-25T13:37:01Z
> container images: 10.96.0.100:5000/spark:spark-2.4.0
> phase: Failed
> status: 
> [ContainerStatus(containerID=docker://43753f5336c41eaec8cdcdfd271b34ac465de331aad2d612fe0c7ad1c3706aac,
>  image=10.96.0.100:5000/spark:spark-2.4.0, 
> imageID=docker-pullable://10.96.0.100:5000/spark@sha256:5b47e2a29aeb1c644fc3853933be2ad08f9cd233dec0977908803e9a1f870b0f,
>  lastState=ContainerState(running=null, terminated=null, waiting=null, 
> additionalProperties={}), name=spark-kubernetes-driver, ready=false, 
> restartCount=0, state=ContainerState(running=null, 
> terminated=ContainerStateTerminated(containerID=docker://43753f5336c41eaec8cdcdfd271b34ac465de331aad2d612fe0c7ad1c3706aac,
>  exitCode=1, finishedAt=Time(time=2019-04-25T13:37:48Z, 
> additionalProperties={}), message=null, reason=Error, signal=null, 
> startedAt=Time(time=2019-04-25T13:37:28Z, additionalProperties={}), 
> additionalProperties={}), waiting=null, additionalProperties={}), 
> additionalProperties={})]
> 2019-04-25 13:37:52 INFO LoggingPodStatusWatcherImpl:54 - Container final 
> statuses:
> Container name: spark-kubernetes-driver
>  Container image: 10.96.0.100:5000/spark:spark-2.4.0
>  Container state: Terminated
>  Exit code: 1
> 2019-04-25 13:37:52 INFO Client:54 - Application 
> com.xxxx.cloud.mf.trainer.Submit finished.
> 2019-04-25 13:37:52 INFO ShutdownHookManager:54 - Shutdown hook called
> 2019-04-25 13:37:52 INFO ShutdownHookManager:54 - Deleting directory 
> /tmp/spark-84727675-4ced-491c-8993-22e8f3539bf3
> bash-4.4#
>  
>  
> Please let me know if I miss anything.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to