[ https://issues.apache.org/jira/browse/SPARK-27574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Will Zhang updated SPARK-27574: ------------------------------- Description: I'm using spark-on-kubernetes to submit spark app to kubernetes. most of the time, it runs smoothly. but sometimes, I see logs after submitting: the driver pod phase changed from running to pending and starts another container in the pod though the first container exited successfully. I use the standard spark-submit to kubernetes like: /opt/spark/spark-2.4.0-bin-hadoop2.7/bin/spark-submit --deploy-mode cluster --class xxx ... log is below: 19/04/19 09:38:40 INFO LineBufferedStream: stdout: 2019-04-19 09:38:40 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: 19/04/19 09:38:40 INFO LineBufferedStream: stdout: pod name: com-xxxx-cloud-mf-trainer-submit-1555666719424-driver 19/04/19 09:38:40 INFO LineBufferedStream: stdout: namespace: default 19/04/19 09:38:40 INFO LineBufferedStream: stdout: labels: DagTask_ID -> 54f854e2-0bce-4bd6-50e7-57b521b216f7, spark-app-selector -> spark-4343fe80572c4240bd933246efd975da, spark-role -> driver 19/04/19 09:38:40 INFO LineBufferedStream: stdout: pod uid: ea4410d5-6286-11e9-ae72-e8611f1fbb2a 19/04/19 09:38:40 INFO LineBufferedStream: stdout: creation time: 2019-04-19T09:38:40Z 19/04/19 09:38:40 INFO LineBufferedStream: stdout: service account name: default 19/04/19 09:38:40 INFO LineBufferedStream: stdout: volumes: spark-local-dir-1, spark-conf-volume, default-token-q7drh 19/04/19 09:38:40 INFO LineBufferedStream: stdout: node name: N/A 19/04/19 09:38:40 INFO LineBufferedStream: stdout: start time: N/A 19/04/19 09:38:40 INFO LineBufferedStream: stdout: container images: N/A 19/04/19 09:38:40 INFO LineBufferedStream: stdout: phase: Pending 19/04/19 09:38:40 INFO LineBufferedStream: stdout: status: [] 19/04/19 09:38:40 INFO LineBufferedStream: stdout: 2019-04-19 09:38:40 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: 19/04/19 09:38:40 INFO LineBufferedStream: stdout: pod name: com-xxxx-cloud-mf-trainer-submit-1555666719424-driver 19/04/19 09:38:40 INFO LineBufferedStream: stdout: namespace: default 19/04/19 09:38:40 INFO LineBufferedStream: stdout: labels: DagTask_ID -> 54f854e2-0bce-4bd6-50e7-57b521b216f7, spark-app-selector -> spark-4343fe80572c4240bd933246efd975da, spark-role -> driver 19/04/19 09:38:40 INFO LineBufferedStream: stdout: pod uid: ea4410d5-6286-11e9-ae72-e8611f1fbb2a 19/04/19 09:38:40 INFO LineBufferedStream: stdout: creation time: 2019-04-19T09:38:40Z 19/04/19 09:38:40 INFO LineBufferedStream: stdout: service account name: default 19/04/19 09:38:40 INFO LineBufferedStream: stdout: volumes: spark-local-dir-1, spark-conf-volume, default-token-q7drh 19/04/19 09:38:40 INFO LineBufferedStream: stdout: node name: yq01-m12-ai2b-service02.yq01.xxxx.com 19/04/19 09:38:40 INFO LineBufferedStream: stdout: start time: N/A 19/04/19 09:38:40 INFO LineBufferedStream: stdout: container images: N/A 19/04/19 09:38:40 INFO LineBufferedStream: stdout: phase: Pending 19/04/19 09:38:40 INFO LineBufferedStream: stdout: status: [] 19/04/19 09:38:41 INFO LineBufferedStream: stdout: 2019-04-19 09:38:41 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: 19/04/19 09:38:41 INFO LineBufferedStream: stdout: pod name: com-xxxx-cloud-mf-trainer-submit-1555666719424-driver 19/04/19 09:38:41 INFO LineBufferedStream: stdout: namespace: default 19/04/19 09:38:41 INFO LineBufferedStream: stdout: labels: DagTask_ID -> 54f854e2-0bce-4bd6-50e7-57b521b216f7, spark-app-selector -> spark-4343fe80572c4240bd933246efd975da, spark-role -> driver 19/04/19 09:38:41 INFO LineBufferedStream: stdout: pod uid: ea4410d5-6286-11e9-ae72-e8611f1fbb2a 19/04/19 09:38:41 INFO LineBufferedStream: stdout: creation time: 2019-04-19T09:38:40Z 19/04/19 09:38:41 INFO LineBufferedStream: stdout: service account name: default 19/04/19 09:38:41 INFO LineBufferedStream: stdout: volumes: spark-local-dir-1, spark-conf-volume, default-token-q7drh 19/04/19 09:38:41 INFO LineBufferedStream: stdout: node name: yq01-m12-ai2b-service02.yq01.xxxx.com 19/04/19 09:38:41 INFO LineBufferedStream: stdout: start time: 2019-04-19T09:38:40Z 19/04/19 09:38:41 INFO LineBufferedStream: stdout: container images: 10.96.0.100:5000/spark:spark-2.4.0 19/04/19 09:38:41 INFO LineBufferedStream: stdout: phase: Pending 19/04/19 09:38:41 INFO LineBufferedStream: stdout: status: [ContainerStatus(containerID=null, image=10.96.0.100:5000/spark:spark-2.4.0, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})] 19/04/19 09:38:45 INFO LineBufferedStream: stdout: 2019-04-19 09:38:45 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: 19/04/19 09:38:45 INFO LineBufferedStream: stdout: pod name: com-xxxx-cloud-mf-trainer-submit-1555666719424-driver 19/04/19 09:38:45 INFO LineBufferedStream: stdout: namespace: default 19/04/19 09:38:45 INFO LineBufferedStream: stdout: labels: DagTask_ID -> 54f854e2-0bce-4bd6-50e7-57b521b216f7, spark-app-selector -> spark-4343fe80572c4240bd933246efd975da, spark-role -> driver 19/04/19 09:38:45 INFO LineBufferedStream: stdout: pod uid: ea4410d5-6286-11e9-ae72-e8611f1fbb2a 19/04/19 09:38:45 INFO LineBufferedStream: stdout: creation time: 2019-04-19T09:38:40Z 19/04/19 09:38:45 INFO LineBufferedStream: stdout: service account name: default 19/04/19 09:38:45 INFO LineBufferedStream: stdout: volumes: spark-local-dir-1, spark-conf-volume, default-token-q7drh 19/04/19 09:38:45 INFO LineBufferedStream: stdout: node name: yq01-m12-ai2b-service02.yq01.xxxx.com 19/04/19 09:38:45 INFO LineBufferedStream: stdout: start time: 2019-04-19T09:38:40Z 19/04/19 09:38:45 INFO LineBufferedStream: stdout: container images: 10.96.0.100:5000/spark-2.4.0:latest_7fdb0b75-0e7b-4587-42c7-b79a3dbd9f83 19/04/19 09:38:45 INFO LineBufferedStream: stdout: phase: Running 19/04/19 09:38:45 INFO LineBufferedStream: stdout: status: [ContainerStatus(containerID=docker://3d21a87775d016719d2f318739fe16dac62422e61fdc023cacdafaa7fce0f6ec, image=10.96.0.100:5000/spark-2.4.0:latest_7fdb0b75-0e7b-4587-42c7-b79a3dbd9f83, imageID=docker-pullable://10.96.0.100:5000/spark-2.4.0@sha256:5b47e2a29aeb1c644fc3853933be2ad08f9cd233dec0977908803e9a1f870b0f, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2019-04-19T09:38:44Z, additionalProperties={}), additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})] 19/04/19 09:38:46 INFO BatchSession$: Creating batch session 211: [owner: null, request: [proxyUser: None, file: hdfs://yq01-m12-ai2b-service02.yq01.xxxx.com:9000/bdl-service/module/jar/module-0.1-jar-with-dependencies.jar, args: --mode,train,--graph,hdfs://yq01-m12-ai2b-service02.yq01.xxxx.com:9000/project/62247e3a-e322-4456-6387-a66e9490652e/exp/62c37ae9-12aa-43f7-671f-d187e1bf1f84/graph/08e1dfad-c272-45ca-4201-1a8bc691a56e/meta/node1555662130294/graph.json,--tracking_server_url,http://10.155.197.12:8080,--sk,56305f9f-b755-4b42-4218-592555f5c4a8,--ak,970f5e4c-7171-4c61-603e-f101b65a573b, driverMemory: 2048m, driverCores: 1, numExecutors: 2, conf: spark.kubernetes.driver.label.DagTask_ID -> 5fd12b90-fbbb-41f0-41ad-7bc5bd0abfe0,spark.kubernetes.driverEnv.xxxx_KUBERNETES_LOG_ENDPOINT -> yq01-m12-ai2b-service02.yq01.xxxx.com:8070,spark.hadoop.fs.defaultFS -> hdfs://yq01-m12-ai2b-service02.yq01.xxxx.com:9000,spark.executorEnv.xxxx_KUBERNETES_LOG_FLUSH_FREQUENCY -> 10s,spark.kubernetes.driverEnv.xxxx_KUBERNETES_LOG_PATH -> /project/62247e3a-e322-4456-6387-a66e9490652e/exp/62c37ae9-12aa-43f7-671f-d187e1bf1f84/graph/08e1dfad-c272-45ca-4201-1a8bc691a56e/log/driver,spark.kubernetes.container.image -> 10.96.0.100:5000/spark:spark-2.4.0,spark.executorEnv.xxxx_KUBERNETES_LOG_PATH -> /project/62247e3a-e322-4456-6387-a66e9490652e/exp/62c37ae9-12aa-43f7-671f-d187e1bf1f84/graph/08e1dfad-c272-45ca-4201-1a8bc691a56e/log/executor,spark.executorEnv.xxxx_KUBERNETES_LOG_ENDPOINT -> yq01-m12-ai2b-service02.yq01.xxxx.com:8070,spark.kubernetes.driverEnv.xxxx_KUBERNETES_LOG_FLUSH_FREQUENCY -> 10s]] 19/04/19 09:38:46 INFO SparkProcessBuilder: Running '/opt/spark/spark-2.4.0-bin-hadoop2.7/bin/spark-submit' '--deploy-mode' 'cluster' '--class' 'com.xxxx.cloud.mf.trainer.Submit' '--conf' 'spark.executorEnv.xxxx_KUBERNETES_LOG_PATH=/project/62247e3a-e322-4456-6387-a66e9490652e/exp/62c37ae9-12aa-43f7-671f-d187e1bf1f84/graph/08e1dfad-c272-45ca-4201-1a8bc691a56e/log/executor' '--conf' 'spark.driver.memory=2048m' '--conf' 'spark.executor.instances=2' '--conf' 'spark.kubernetes.driver.label.DagTask_ID=5fd12b90-fbbb-41f0-41ad-7bc5bd0abfe0' '--conf' 'spark.kubernetes.driverEnv.xxxx_KUBERNETES_LOG_FLUSH_FREQUENCY=10s' '--conf' 'spark.driver.cores=1' '--conf' 'spark.kubernetes.driverEnv.xxxx_KUBERNETES_LOG_PATH=/project/62247e3a-e322-4456-6387-a66e9490652e/exp/62c37ae9-12aa-43f7-671f-d187e1bf1f84/graph/08e1dfad-c272-45ca-4201-1a8bc691a56e/log/driver' '--conf' 'spark.executorEnv.xxxx_KUBERNETES_LOG_ENDPOINT=yq01-m12-ai2b-service02.yq01.xxxx.com:8070' '--conf' 'spark.submit.deployMode=cluster' '--conf' 'spark.hadoop.fs.defaultFS=hdfs://yq01-m12-ai2b-service02.yq01.xxxx.com:9000' '--conf' 'spark.kubernetes.driverEnv.xxxx_KUBERNETES_LOG_ENDPOINT=yq01-m12-ai2b-service02.yq01.xxxx.com:8070' '--conf' 'spark.kubernetes.container.image=10.96.0.100:5000/spark:spark-2.4.0' '--conf' 'spark.master=k8s://https://10.155.197.12:6443' '--conf' 'spark.executorEnv.xxxx_KUBERNETES_LOG_FLUSH_FREQUENCY=10s' 'hdfs://yq01-m12-ai2b-service02.yq01.xxxx.com:9000/bdl-service/module/jar/module-0.1-jar-with-dependencies.jar' '--mode' 'train' '--graph' 'hdfs://yq01-m12-ai2b-service02.yq01.xxxx.com:9000/project/62247e3a-e322-4456-6387-a66e9490652e/exp/62c37ae9-12aa-43f7-671f-d187e1bf1f84/graph/08e1dfad-c272-45ca-4201-1a8bc691a56e/meta/node1555662130294/graph.json' '--tracking_server_url' 'http://10.155.197.12:8080' '--sk' '56305f9f-b755-4b42-4218-592555f5c4a8' '--ak' '970f5e4c-7171-4c61-603e-f101b65a573b' 19/04/19 09:39:57 INFO LineBufferedStream: stdout: 2019-04-19 09:39:57 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: 19/04/19 09:39:57 INFO LineBufferedStream: stdout: pod name: com-xxxx-cloud-mf-trainer-submit-1555666719424-driver 19/04/19 09:39:57 INFO LineBufferedStream: stdout: namespace: default 19/04/19 09:39:57 INFO LineBufferedStream: stdout: labels: DagTask_ID -> 54f854e2-0bce-4bd6-50e7-57b521b216f7, spark-app-selector -> spark-4343fe80572c4240bd933246efd975da, spark-role -> driver 19/04/19 09:39:57 INFO LineBufferedStream: stdout: pod uid: ea4410d5-6286-11e9-ae72-e8611f1fbb2a 19/04/19 09:39:57 INFO LineBufferedStream: stdout: creation time: 2019-04-19T09:38:40Z 19/04/19 09:39:57 INFO LineBufferedStream: stdout: service account name: default 19/04/19 09:39:57 INFO LineBufferedStream: stdout: volumes: spark-local-dir-1, spark-conf-volume, default-token-q7drh 19/04/19 09:39:57 INFO LineBufferedStream: stdout: node name: yq01-m12-ai2b-service02.yq01.xxxx.com 19/04/19 09:39:57 INFO LineBufferedStream: stdout: start time: 2019-04-19T09:38:40Z 19/04/19 09:39:57 INFO LineBufferedStream: stdout: container images: 10.96.0.100:5000/spark:spark-2.4.0 19/04/19 09:39:57 INFO LineBufferedStream: stdout: phase: Pending 19/04/19 09:39:57 INFO LineBufferedStream: stdout: status: [ContainerStatus(containerID=null, image=10.96.0.100:5000/spark:spark-2.4.0, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})] 19/04/19 09:40:00 INFO LineBufferedStream: stdout: 2019-04-19 09:40:00 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: 19/04/19 09:40:00 INFO LineBufferedStream: stdout: pod name: com-xxxx-cloud-mf-trainer-submit-1555666719424-driver 19/04/19 09:40:00 INFO LineBufferedStream: stdout: namespace: default 19/04/19 09:40:00 INFO LineBufferedStream: stdout: labels: DagTask_ID -> 54f854e2-0bce-4bd6-50e7-57b521b216f7, spark-app-selector -> spark-4343fe80572c4240bd933246efd975da, spark-role -> driver 19/04/19 09:40:00 INFO LineBufferedStream: stdout: pod uid: ea4410d5-6286-11e9-ae72-e8611f1fbb2a 19/04/19 09:40:00 INFO LineBufferedStream: stdout: creation time: 2019-04-19T09:38:40Z 19/04/19 09:40:00 INFO LineBufferedStream: stdout: service account name: default 19/04/19 09:40:00 INFO LineBufferedStream: stdout: volumes: spark-local-dir-1, spark-conf-volume, default-token-q7drh 19/04/19 09:40:00 INFO LineBufferedStream: stdout: node name: yq01-m12-ai2b-service02.yq01.xxxx.com 19/04/19 09:40:00 INFO LineBufferedStream: stdout: start time: 2019-04-19T09:38:40Z 19/04/19 09:40:00 INFO LineBufferedStream: stdout: container images: 10.96.0.100:5000/spark-2.4.0:latest_7fdb0b75-0e7b-4587-42c7-b79a3dbd9f83 19/04/19 09:40:00 INFO LineBufferedStream: stdout: phase: Running 19/04/19 09:40:00 INFO LineBufferedStream: stdout: status: [ContainerStatus(containerID=docker://23c9ea6767a274f8e8759da39dee90f403d9d28b1fec97c1fa4cd8746b41c8c3, image=10.96.0.100:5000/spark-2.4.0:latest_7fdb0b75-0e7b-4587-42c7-b79a3dbd9f83, imageID=docker-pullable://10.96.0.100:5000/spark-2.4.0@sha256:5b47e2a29aeb1c644fc3853933be2ad08f9cd233dec0977908803e9a1f870b0f, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2019-04-19T09:39:57Z, additionalProperties={}), additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})] 19/04/19 09:40:51 INFO LineBufferedStream: stdout: 2019-04-19 09:40:51 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: 19/04/19 09:40:51 INFO LineBufferedStream: stdout: pod name: com-xxxx-cloud-mf-trainer-submit-1555666719424-driver 19/04/19 09:40:51 INFO LineBufferedStream: stdout: namespace: default 19/04/19 09:40:51 INFO LineBufferedStream: stdout: labels: DagTask_ID -> 54f854e2-0bce-4bd6-50e7-57b521b216f7, spark-app-selector -> spark-4343fe80572c4240bd933246efd975da, spark-role -> driver 19/04/19 09:40:51 INFO LineBufferedStream: stdout: pod uid: ea4410d5-6286-11e9-ae72-e8611f1fbb2a 19/04/19 09:40:51 INFO LineBufferedStream: stdout: creation time: 2019-04-19T09:38:40Z 19/04/19 09:40:51 INFO LineBufferedStream: stdout: service account name: default 19/04/19 09:40:51 INFO LineBufferedStream: stdout: volumes: spark-local-dir-1, spark-conf-volume, default-token-q7drh 19/04/19 09:40:51 INFO LineBufferedStream: stdout: node name: yq01-m12-ai2b-service02.yq01.xxxx.com 19/04/19 09:40:51 INFO LineBufferedStream: stdout: start time: 2019-04-19T09:38:40Z 19/04/19 09:40:51 INFO LineBufferedStream: stdout: container images: 10.96.0.100:5000/spark-2.4.0:latest_7fdb0b75-0e7b-4587-42c7-b79a3dbd9f83 19/04/19 09:40:51 INFO LineBufferedStream: stdout: phase: Failed 19/04/19 09:40:51 INFO LineBufferedStream: stdout: status: [ContainerStatus(containerID=docker://23c9ea6767a274f8e8759da39dee90f403d9d28b1fec97c1fa4cd8746b41c8c3, image=10.96.0.100:5000/spark-2.4.0:latest_7fdb0b75-0e7b-4587-42c7-b79a3dbd9f83, imageID=docker-pullable://10.96.0.100:5000/spark-2.4.0@sha256:5b47e2a29aeb1c644fc3853933be2ad08f9cd233dec0977908803e9a1f870b0f, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=docker://23c9ea6767a274f8e8759da39dee90f403d9d28b1fec97c1fa4cd8746b41c8c3, exitCode=1, finishedAt=Time(time=2019-04-19T09:40:48Z, additionalProperties={}), message=null, reason=Error, signal=null, startedAt=Time(time=2019-04-19T09:39:57Z, additionalProperties={}), additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})] 19/04/19 09:40:51 INFO LineBufferedStream: stdout: 2019-04-19 09:40:51 INFO LoggingPodStatusWatcherImpl:54 - Container final statuses: Please let me know if I miss anything. Any help appreciated. was: I'm using spark-on-kubernetes to submit spark app to kubernetes. most of the time, it runs smoothly. but sometimes, I see logs after submitting: the driver pod phase changed from running to pending and starts another container in the pod though the first container exited successfully. I use the standard spark-submit to kubernetes like: /opt/spark/spark-2.4.0-bin-hadoop2.7/bin/spark-submit --deploy-mode cluster --class xxx ... log is below: 2019-04-25 13:37:01 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: pod name: com-xxxx-cloud-mf-trainer-submit-1556199419847-driver namespace: default labels: DagTask_ID -> 5fd12b90-fbbb-41f0-41ad-7bc5bd0abfe0, spark-app-selector -> spark-3c8350a62ab44c139ce073d654fddebb, spark-role -> driver pod uid: 348cdcf5-675f-11e9-ae72-e8611f1fbb2a creation time: 2019-04-25T13:37:01Z service account name: default volumes: spark-local-dir-1, spark-conf-volume, default-token-q7drh node name: N/A start time: N/A container images: N/A phase: Pending status: [] 2019-04-25 13:37:01 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: pod name: com-xxxx-cloud-mf-trainer-submit-1556199419847-driver namespace: default labels: DagTask_ID -> 5fd12b90-fbbb-41f0-41ad-7bc5bd0abfe0, spark-app-selector -> spark-3c8350a62ab44c139ce073d654fddebb, spark-role -> driver pod uid: 348cdcf5-675f-11e9-ae72-e8611f1fbb2a creation time: 2019-04-25T13:37:01Z service account name: default volumes: spark-local-dir-1, spark-conf-volume, default-token-q7drh node name: yq01-m12-ai2b-service02.yq01.xxxx.com start time: N/A container images: N/A phase: Pending status: [] 2019-04-25 13:37:01 INFO Client:54 - Waiting for application com.xxxx.cloud.mf.trainer.Submit to finish... 2019-04-25 13:37:01 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: pod name: com-xxxx-cloud-mf-trainer-submit-1556199419847-driver namespace: default labels: DagTask_ID -> 5fd12b90-fbbb-41f0-41ad-7bc5bd0abfe0, spark-app-selector -> spark-3c8350a62ab44c139ce073d654fddebb, spark-role -> driver pod uid: 348cdcf5-675f-11e9-ae72-e8611f1fbb2a creation time: 2019-04-25T13:37:01Z service account name: default volumes: spark-local-dir-1, spark-conf-volume, default-token-q7drh node name: yq01-m12-ai2b-service02.yq01.xxxx.com start time: 2019-04-25T13:37:01Z container images: 10.96.0.100:5000/spark:spark-2.4.0 phase: Pending status: [ContainerStatus(containerID=null, image=10.96.0.100:5000/spark:spark-2.4.0, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})] 2019-04-25 13:37:04 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: pod name: com-xxxx-cloud-mf-trainer-submit-1556199419847-driver namespace: default labels: DagTask_ID -> 5fd12b90-fbbb-41f0-41ad-7bc5bd0abfe0, spark-app-selector -> spark-3c8350a62ab44c139ce073d654fddebb, spark-role -> driver pod uid: 348cdcf5-675f-11e9-ae72-e8611f1fbb2a creation time: 2019-04-25T13:37:01Z service account name: default volumes: spark-local-dir-1, spark-conf-volume, default-token-q7drh node name: yq01-m12-ai2b-service02.yq01.xxxx.com start time: 2019-04-25T13:37:01Z container images: 10.96.0.100:5000/spark:spark-2.4.0 phase: Running status: [ContainerStatus(containerID=docker://120dbf8cb11cf8ef9b26cff3354e096a979beb35279de34be64b3c06e896b991, image=10.96.0.100:5000/spark:spark-2.4.0, imageID=docker-pullable://10.96.0.100:5000/spark@sha256:5b47e2a29aeb1c644fc3853933be2ad08f9cd233dec0977908803e9a1f870b0f, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2019-04-25T13:37:03Z, additionalProperties={}), additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})] 2019-04-25 13:37:27 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: pod name: com-xxxx-cloud-mf-trainer-submit-1556199419847-driver namespace: default labels: DagTask_ID -> 5fd12b90-fbbb-41f0-41ad-7bc5bd0abfe0, spark-app-selector -> spark-3c8350a62ab44c139ce073d654fddebb, spark-role -> driver pod uid: 348cdcf5-675f-11e9-ae72-e8611f1fbb2a creation time: 2019-04-25T13:37:01Z service account name: default volumes: spark-local-dir-1, spark-conf-volume, default-token-q7drh node name: yq01-m12-ai2b-service02.yq01.xxxx.com start time: 2019-04-25T13:37:01Z container images: 10.96.0.100:5000/spark:spark-2.4.0 phase: Pending status: [ContainerStatus(containerID=null, image=10.96.0.100:5000/spark:spark-2.4.0, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})] 2019-04-25 13:37:29 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: pod name: com-xxxx-cloud-mf-trainer-submit-1556199419847-driver namespace: default labels: DagTask_ID -> 5fd12b90-fbbb-41f0-41ad-7bc5bd0abfe0, spark-app-selector -> spark-3c8350a62ab44c139ce073d654fddebb, spark-role -> driver pod uid: 348cdcf5-675f-11e9-ae72-e8611f1fbb2a creation time: 2019-04-25T13:37:01Z service account name: default volumes: spark-local-dir-1, spark-conf-volume, default-token-q7drh node name: yq01-m12-ai2b-service02.yq01.xxxx.com start time: 2019-04-25T13:37:01Z container images: 10.96.0.100:5000/spark:spark-2.4.0 phase: Running status: [ContainerStatus(containerID=docker://43753f5336c41eaec8cdcdfd271b34ac465de331aad2d612fe0c7ad1c3706aac, image=10.96.0.100:5000/spark:spark-2.4.0, imageID=docker-pullable://10.96.0.100:5000/spark@sha256:5b47e2a29aeb1c644fc3853933be2ad08f9cd233dec0977908803e9a1f870b0f, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2019-04-25T13:37:28Z, additionalProperties={}), additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})] 2019-04-25 13:37:52 INFO LoggingPodStatusWatcherImpl:54 - State changed, new state: pod name: com-xxxx-cloud-mf-trainer-submit-1556199419847-driver namespace: default labels: DagTask_ID -> 5fd12b90-fbbb-41f0-41ad-7bc5bd0abfe0, spark-app-selector -> spark-3c8350a62ab44c139ce073d654fddebb, spark-role -> driver pod uid: 348cdcf5-675f-11e9-ae72-e8611f1fbb2a creation time: 2019-04-25T13:37:01Z service account name: default volumes: spark-local-dir-1, spark-conf-volume, default-token-q7drh node name: yq01-m12-ai2b-service02.yq01.xxxx.com start time: 2019-04-25T13:37:01Z container images: 10.96.0.100:5000/spark:spark-2.4.0 phase: Failed status: [ContainerStatus(containerID=docker://43753f5336c41eaec8cdcdfd271b34ac465de331aad2d612fe0c7ad1c3706aac, image=10.96.0.100:5000/spark:spark-2.4.0, imageID=docker-pullable://10.96.0.100:5000/spark@sha256:5b47e2a29aeb1c644fc3853933be2ad08f9cd233dec0977908803e9a1f870b0f, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=docker://43753f5336c41eaec8cdcdfd271b34ac465de331aad2d612fe0c7ad1c3706aac, exitCode=1, finishedAt=Time(time=2019-04-25T13:37:48Z, additionalProperties={}), message=null, reason=Error, signal=null, startedAt=Time(time=2019-04-25T13:37:28Z, additionalProperties={}), additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})] 2019-04-25 13:37:52 INFO LoggingPodStatusWatcherImpl:54 - Container final statuses: Container name: spark-kubernetes-driver Container image: 10.96.0.100:5000/spark:spark-2.4.0 Container state: Terminated Exit code: 1 2019-04-25 13:37:52 INFO Client:54 - Application com.xxxx.cloud.mf.trainer.Submit finished. 2019-04-25 13:37:52 INFO ShutdownHookManager:54 - Shutdown hook called 2019-04-25 13:37:52 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-84727675-4ced-491c-8993-22e8f3539bf3 bash-4.4# Please let me know if I miss anything. > spark on kubernetes driver pod phase changed from running to pending and > starts another container in pod > -------------------------------------------------------------------------------------------------------- > > Key: SPARK-27574 > URL: https://issues.apache.org/jira/browse/SPARK-27574 > Project: Spark > Issue Type: Bug > Components: Kubernetes > Affects Versions: 2.4.0 > Environment: Kubernetes version (use kubectl version): > v1.10.0 > OS (e.g: cat /etc/os-release): > CentOS-7 > Kernel (e.g. uname -a): > 4.17.11-1.el7.elrepo.x86_64 > Spark-2.4.0 > Reporter: Will Zhang > Priority: Major > Attachments: driver-pod-logs.zip > > > I'm using spark-on-kubernetes to submit spark app to kubernetes. > most of the time, it runs smoothly. > but sometimes, I see logs after submitting: the driver pod phase changed from > running to pending and starts another container in the pod though the first > container exited successfully. > I use the standard spark-submit to kubernetes like: > /opt/spark/spark-2.4.0-bin-hadoop2.7/bin/spark-submit --deploy-mode cluster > --class xxx ... > > log is below: > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: 2019-04-19 09:38:40 INFO > LoggingPodStatusWatcherImpl:54 - State changed, new state: > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: pod name: > com-xxxx-cloud-mf-trainer-submit-1555666719424-driver > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: namespace: default > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: labels: DagTask_ID -> > 54f854e2-0bce-4bd6-50e7-57b521b216f7, spark-app-selector -> > spark-4343fe80572c4240bd933246efd975da, spark-role -> driver > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: pod uid: > ea4410d5-6286-11e9-ae72-e8611f1fbb2a > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: creation time: > 2019-04-19T09:38:40Z > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: service account name: > default > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: volumes: > spark-local-dir-1, spark-conf-volume, default-token-q7drh > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: node name: N/A > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: start time: N/A > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: container images: N/A > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: phase: Pending > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: status: [] > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: 2019-04-19 09:38:40 INFO > LoggingPodStatusWatcherImpl:54 - State changed, new state: > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: pod name: > com-xxxx-cloud-mf-trainer-submit-1555666719424-driver > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: namespace: default > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: labels: DagTask_ID -> > 54f854e2-0bce-4bd6-50e7-57b521b216f7, spark-app-selector -> > spark-4343fe80572c4240bd933246efd975da, spark-role -> driver > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: pod uid: > ea4410d5-6286-11e9-ae72-e8611f1fbb2a > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: creation time: > 2019-04-19T09:38:40Z > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: service account name: > default > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: volumes: > spark-local-dir-1, spark-conf-volume, default-token-q7drh > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: node name: > yq01-m12-ai2b-service02.yq01.xxxx.com > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: start time: N/A > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: container images: N/A > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: phase: Pending > 19/04/19 09:38:40 INFO LineBufferedStream: stdout: status: [] > 19/04/19 09:38:41 INFO LineBufferedStream: stdout: 2019-04-19 09:38:41 INFO > LoggingPodStatusWatcherImpl:54 - State changed, new state: > 19/04/19 09:38:41 INFO LineBufferedStream: stdout: pod name: > com-xxxx-cloud-mf-trainer-submit-1555666719424-driver > 19/04/19 09:38:41 INFO LineBufferedStream: stdout: namespace: default > 19/04/19 09:38:41 INFO LineBufferedStream: stdout: labels: DagTask_ID -> > 54f854e2-0bce-4bd6-50e7-57b521b216f7, spark-app-selector -> > spark-4343fe80572c4240bd933246efd975da, spark-role -> driver > 19/04/19 09:38:41 INFO LineBufferedStream: stdout: pod uid: > ea4410d5-6286-11e9-ae72-e8611f1fbb2a > 19/04/19 09:38:41 INFO LineBufferedStream: stdout: creation time: > 2019-04-19T09:38:40Z > 19/04/19 09:38:41 INFO LineBufferedStream: stdout: service account name: > default > 19/04/19 09:38:41 INFO LineBufferedStream: stdout: volumes: > spark-local-dir-1, spark-conf-volume, default-token-q7drh > 19/04/19 09:38:41 INFO LineBufferedStream: stdout: node name: > yq01-m12-ai2b-service02.yq01.xxxx.com > 19/04/19 09:38:41 INFO LineBufferedStream: stdout: start time: > 2019-04-19T09:38:40Z > 19/04/19 09:38:41 INFO LineBufferedStream: stdout: container images: > 10.96.0.100:5000/spark:spark-2.4.0 > 19/04/19 09:38:41 INFO LineBufferedStream: stdout: phase: Pending > 19/04/19 09:38:41 INFO LineBufferedStream: stdout: status: > [ContainerStatus(containerID=null, image=10.96.0.100:5000/spark:spark-2.4.0, > imageID=, lastState=ContainerState(running=null, terminated=null, > waiting=null, additionalProperties={}), name=spark-kubernetes-driver, > ready=false, restartCount=0, state=ContainerState(running=null, > terminated=null, waiting=ContainerStateWaiting(message=null, > reason=ContainerCreating, additionalProperties={}), additionalProperties={}), > additionalProperties={})] > 19/04/19 09:38:45 INFO LineBufferedStream: stdout: 2019-04-19 09:38:45 INFO > LoggingPodStatusWatcherImpl:54 - State changed, new state: > 19/04/19 09:38:45 INFO LineBufferedStream: stdout: pod name: > com-xxxx-cloud-mf-trainer-submit-1555666719424-driver > 19/04/19 09:38:45 INFO LineBufferedStream: stdout: namespace: default > 19/04/19 09:38:45 INFO LineBufferedStream: stdout: labels: DagTask_ID -> > 54f854e2-0bce-4bd6-50e7-57b521b216f7, spark-app-selector -> > spark-4343fe80572c4240bd933246efd975da, spark-role -> driver > 19/04/19 09:38:45 INFO LineBufferedStream: stdout: pod uid: > ea4410d5-6286-11e9-ae72-e8611f1fbb2a > 19/04/19 09:38:45 INFO LineBufferedStream: stdout: creation time: > 2019-04-19T09:38:40Z > 19/04/19 09:38:45 INFO LineBufferedStream: stdout: service account name: > default > 19/04/19 09:38:45 INFO LineBufferedStream: stdout: volumes: > spark-local-dir-1, spark-conf-volume, default-token-q7drh > 19/04/19 09:38:45 INFO LineBufferedStream: stdout: node name: > yq01-m12-ai2b-service02.yq01.xxxx.com > 19/04/19 09:38:45 INFO LineBufferedStream: stdout: start time: > 2019-04-19T09:38:40Z > 19/04/19 09:38:45 INFO LineBufferedStream: stdout: container images: > 10.96.0.100:5000/spark-2.4.0:latest_7fdb0b75-0e7b-4587-42c7-b79a3dbd9f83 > 19/04/19 09:38:45 INFO LineBufferedStream: stdout: phase: Running > 19/04/19 09:38:45 INFO LineBufferedStream: stdout: status: > [ContainerStatus(containerID=docker://3d21a87775d016719d2f318739fe16dac62422e61fdc023cacdafaa7fce0f6ec, > > image=10.96.0.100:5000/spark-2.4.0:latest_7fdb0b75-0e7b-4587-42c7-b79a3dbd9f83, > > imageID=docker-pullable://10.96.0.100:5000/spark-2.4.0@sha256:5b47e2a29aeb1c644fc3853933be2ad08f9cd233dec0977908803e9a1f870b0f, > lastState=ContainerState(running=null, terminated=null, waiting=null, > additionalProperties={}), name=spark-kubernetes-driver, ready=true, > restartCount=0, > state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2019-04-19T09:38:44Z, > additionalProperties={}), additionalProperties={}), terminated=null, > waiting=null, additionalProperties={}), additionalProperties={})] > 19/04/19 09:38:46 INFO BatchSession$: Creating batch session 211: [owner: > null, request: [proxyUser: None, file: > hdfs://yq01-m12-ai2b-service02.yq01.xxxx.com:9000/bdl-service/module/jar/module-0.1-jar-with-dependencies.jar, > args: > --mode,train,--graph,hdfs://yq01-m12-ai2b-service02.yq01.xxxx.com:9000/project/62247e3a-e322-4456-6387-a66e9490652e/exp/62c37ae9-12aa-43f7-671f-d187e1bf1f84/graph/08e1dfad-c272-45ca-4201-1a8bc691a56e/meta/node1555662130294/graph.json,--tracking_server_url,http://10.155.197.12:8080,--sk,56305f9f-b755-4b42-4218-592555f5c4a8,--ak,970f5e4c-7171-4c61-603e-f101b65a573b, > driverMemory: 2048m, driverCores: 1, numExecutors: 2, conf: > spark.kubernetes.driver.label.DagTask_ID -> > 5fd12b90-fbbb-41f0-41ad-7bc5bd0abfe0,spark.kubernetes.driverEnv.xxxx_KUBERNETES_LOG_ENDPOINT > -> yq01-m12-ai2b-service02.yq01.xxxx.com:8070,spark.hadoop.fs.defaultFS -> > hdfs://yq01-m12-ai2b-service02.yq01.xxxx.com:9000,spark.executorEnv.xxxx_KUBERNETES_LOG_FLUSH_FREQUENCY > -> 10s,spark.kubernetes.driverEnv.xxxx_KUBERNETES_LOG_PATH -> > /project/62247e3a-e322-4456-6387-a66e9490652e/exp/62c37ae9-12aa-43f7-671f-d187e1bf1f84/graph/08e1dfad-c272-45ca-4201-1a8bc691a56e/log/driver,spark.kubernetes.container.image > -> > 10.96.0.100:5000/spark:spark-2.4.0,spark.executorEnv.xxxx_KUBERNETES_LOG_PATH > -> > /project/62247e3a-e322-4456-6387-a66e9490652e/exp/62c37ae9-12aa-43f7-671f-d187e1bf1f84/graph/08e1dfad-c272-45ca-4201-1a8bc691a56e/log/executor,spark.executorEnv.xxxx_KUBERNETES_LOG_ENDPOINT > -> > yq01-m12-ai2b-service02.yq01.xxxx.com:8070,spark.kubernetes.driverEnv.xxxx_KUBERNETES_LOG_FLUSH_FREQUENCY > -> 10s]] > 19/04/19 09:38:46 INFO SparkProcessBuilder: Running > '/opt/spark/spark-2.4.0-bin-hadoop2.7/bin/spark-submit' '--deploy-mode' > 'cluster' '--class' 'com.xxxx.cloud.mf.trainer.Submit' '--conf' > 'spark.executorEnv.xxxx_KUBERNETES_LOG_PATH=/project/62247e3a-e322-4456-6387-a66e9490652e/exp/62c37ae9-12aa-43f7-671f-d187e1bf1f84/graph/08e1dfad-c272-45ca-4201-1a8bc691a56e/log/executor' > '--conf' 'spark.driver.memory=2048m' '--conf' 'spark.executor.instances=2' > '--conf' > 'spark.kubernetes.driver.label.DagTask_ID=5fd12b90-fbbb-41f0-41ad-7bc5bd0abfe0' > '--conf' > 'spark.kubernetes.driverEnv.xxxx_KUBERNETES_LOG_FLUSH_FREQUENCY=10s' '--conf' > 'spark.driver.cores=1' '--conf' > 'spark.kubernetes.driverEnv.xxxx_KUBERNETES_LOG_PATH=/project/62247e3a-e322-4456-6387-a66e9490652e/exp/62c37ae9-12aa-43f7-671f-d187e1bf1f84/graph/08e1dfad-c272-45ca-4201-1a8bc691a56e/log/driver' > '--conf' > 'spark.executorEnv.xxxx_KUBERNETES_LOG_ENDPOINT=yq01-m12-ai2b-service02.yq01.xxxx.com:8070' > '--conf' 'spark.submit.deployMode=cluster' '--conf' > 'spark.hadoop.fs.defaultFS=hdfs://yq01-m12-ai2b-service02.yq01.xxxx.com:9000' > '--conf' > 'spark.kubernetes.driverEnv.xxxx_KUBERNETES_LOG_ENDPOINT=yq01-m12-ai2b-service02.yq01.xxxx.com:8070' > '--conf' > 'spark.kubernetes.container.image=10.96.0.100:5000/spark:spark-2.4.0' > '--conf' 'spark.master=k8s://https://10.155.197.12:6443' '--conf' > 'spark.executorEnv.xxxx_KUBERNETES_LOG_FLUSH_FREQUENCY=10s' > 'hdfs://yq01-m12-ai2b-service02.yq01.xxxx.com:9000/bdl-service/module/jar/module-0.1-jar-with-dependencies.jar' > '--mode' 'train' '--graph' > 'hdfs://yq01-m12-ai2b-service02.yq01.xxxx.com:9000/project/62247e3a-e322-4456-6387-a66e9490652e/exp/62c37ae9-12aa-43f7-671f-d187e1bf1f84/graph/08e1dfad-c272-45ca-4201-1a8bc691a56e/meta/node1555662130294/graph.json' > '--tracking_server_url' 'http://10.155.197.12:8080' '--sk' > '56305f9f-b755-4b42-4218-592555f5c4a8' '--ak' > '970f5e4c-7171-4c61-603e-f101b65a573b' > 19/04/19 09:39:57 INFO LineBufferedStream: stdout: 2019-04-19 09:39:57 INFO > LoggingPodStatusWatcherImpl:54 - State changed, new state: > 19/04/19 09:39:57 INFO LineBufferedStream: stdout: pod name: > com-xxxx-cloud-mf-trainer-submit-1555666719424-driver > 19/04/19 09:39:57 INFO LineBufferedStream: stdout: namespace: default > 19/04/19 09:39:57 INFO LineBufferedStream: stdout: labels: DagTask_ID -> > 54f854e2-0bce-4bd6-50e7-57b521b216f7, spark-app-selector -> > spark-4343fe80572c4240bd933246efd975da, spark-role -> driver > 19/04/19 09:39:57 INFO LineBufferedStream: stdout: pod uid: > ea4410d5-6286-11e9-ae72-e8611f1fbb2a > 19/04/19 09:39:57 INFO LineBufferedStream: stdout: creation time: > 2019-04-19T09:38:40Z > 19/04/19 09:39:57 INFO LineBufferedStream: stdout: service account name: > default > 19/04/19 09:39:57 INFO LineBufferedStream: stdout: volumes: > spark-local-dir-1, spark-conf-volume, default-token-q7drh > 19/04/19 09:39:57 INFO LineBufferedStream: stdout: node name: > yq01-m12-ai2b-service02.yq01.xxxx.com > 19/04/19 09:39:57 INFO LineBufferedStream: stdout: start time: > 2019-04-19T09:38:40Z > 19/04/19 09:39:57 INFO LineBufferedStream: stdout: container images: > 10.96.0.100:5000/spark:spark-2.4.0 > 19/04/19 09:39:57 INFO LineBufferedStream: stdout: phase: Pending > 19/04/19 09:39:57 INFO LineBufferedStream: stdout: status: > [ContainerStatus(containerID=null, image=10.96.0.100:5000/spark:spark-2.4.0, > imageID=, lastState=ContainerState(running=null, terminated=null, > waiting=null, additionalProperties={}), name=spark-kubernetes-driver, > ready=false, restartCount=0, state=ContainerState(running=null, > terminated=null, waiting=ContainerStateWaiting(message=null, > reason=ContainerCreating, additionalProperties={}), additionalProperties={}), > additionalProperties={})] > 19/04/19 09:40:00 INFO LineBufferedStream: stdout: 2019-04-19 09:40:00 INFO > LoggingPodStatusWatcherImpl:54 - State changed, new state: > 19/04/19 09:40:00 INFO LineBufferedStream: stdout: pod name: > com-xxxx-cloud-mf-trainer-submit-1555666719424-driver > 19/04/19 09:40:00 INFO LineBufferedStream: stdout: namespace: default > 19/04/19 09:40:00 INFO LineBufferedStream: stdout: labels: DagTask_ID -> > 54f854e2-0bce-4bd6-50e7-57b521b216f7, spark-app-selector -> > spark-4343fe80572c4240bd933246efd975da, spark-role -> driver > 19/04/19 09:40:00 INFO LineBufferedStream: stdout: pod uid: > ea4410d5-6286-11e9-ae72-e8611f1fbb2a > 19/04/19 09:40:00 INFO LineBufferedStream: stdout: creation time: > 2019-04-19T09:38:40Z > 19/04/19 09:40:00 INFO LineBufferedStream: stdout: service account name: > default > 19/04/19 09:40:00 INFO LineBufferedStream: stdout: volumes: > spark-local-dir-1, spark-conf-volume, default-token-q7drh > 19/04/19 09:40:00 INFO LineBufferedStream: stdout: node name: > yq01-m12-ai2b-service02.yq01.xxxx.com > 19/04/19 09:40:00 INFO LineBufferedStream: stdout: start time: > 2019-04-19T09:38:40Z > 19/04/19 09:40:00 INFO LineBufferedStream: stdout: container images: > 10.96.0.100:5000/spark-2.4.0:latest_7fdb0b75-0e7b-4587-42c7-b79a3dbd9f83 > 19/04/19 09:40:00 INFO LineBufferedStream: stdout: phase: Running > 19/04/19 09:40:00 INFO LineBufferedStream: stdout: status: > [ContainerStatus(containerID=docker://23c9ea6767a274f8e8759da39dee90f403d9d28b1fec97c1fa4cd8746b41c8c3, > > image=10.96.0.100:5000/spark-2.4.0:latest_7fdb0b75-0e7b-4587-42c7-b79a3dbd9f83, > > imageID=docker-pullable://10.96.0.100:5000/spark-2.4.0@sha256:5b47e2a29aeb1c644fc3853933be2ad08f9cd233dec0977908803e9a1f870b0f, > lastState=ContainerState(running=null, terminated=null, waiting=null, > additionalProperties={}), name=spark-kubernetes-driver, ready=true, > restartCount=0, > state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2019-04-19T09:39:57Z, > additionalProperties={}), additionalProperties={}), terminated=null, > waiting=null, additionalProperties={}), additionalProperties={})] > 19/04/19 09:40:51 INFO LineBufferedStream: stdout: 2019-04-19 09:40:51 INFO > LoggingPodStatusWatcherImpl:54 - State changed, new state: > 19/04/19 09:40:51 INFO LineBufferedStream: stdout: pod name: > com-xxxx-cloud-mf-trainer-submit-1555666719424-driver > 19/04/19 09:40:51 INFO LineBufferedStream: stdout: namespace: default > 19/04/19 09:40:51 INFO LineBufferedStream: stdout: labels: DagTask_ID -> > 54f854e2-0bce-4bd6-50e7-57b521b216f7, spark-app-selector -> > spark-4343fe80572c4240bd933246efd975da, spark-role -> driver > 19/04/19 09:40:51 INFO LineBufferedStream: stdout: pod uid: > ea4410d5-6286-11e9-ae72-e8611f1fbb2a > 19/04/19 09:40:51 INFO LineBufferedStream: stdout: creation time: > 2019-04-19T09:38:40Z > 19/04/19 09:40:51 INFO LineBufferedStream: stdout: service account name: > default > 19/04/19 09:40:51 INFO LineBufferedStream: stdout: volumes: > spark-local-dir-1, spark-conf-volume, default-token-q7drh > 19/04/19 09:40:51 INFO LineBufferedStream: stdout: node name: > yq01-m12-ai2b-service02.yq01.xxxx.com > 19/04/19 09:40:51 INFO LineBufferedStream: stdout: start time: > 2019-04-19T09:38:40Z > 19/04/19 09:40:51 INFO LineBufferedStream: stdout: container images: > 10.96.0.100:5000/spark-2.4.0:latest_7fdb0b75-0e7b-4587-42c7-b79a3dbd9f83 > 19/04/19 09:40:51 INFO LineBufferedStream: stdout: phase: Failed > 19/04/19 09:40:51 INFO LineBufferedStream: stdout: status: > [ContainerStatus(containerID=docker://23c9ea6767a274f8e8759da39dee90f403d9d28b1fec97c1fa4cd8746b41c8c3, > > image=10.96.0.100:5000/spark-2.4.0:latest_7fdb0b75-0e7b-4587-42c7-b79a3dbd9f83, > > imageID=docker-pullable://10.96.0.100:5000/spark-2.4.0@sha256:5b47e2a29aeb1c644fc3853933be2ad08f9cd233dec0977908803e9a1f870b0f, > lastState=ContainerState(running=null, terminated=null, waiting=null, > additionalProperties={}), name=spark-kubernetes-driver, ready=false, > restartCount=0, state=ContainerState(running=null, > terminated=ContainerStateTerminated(containerID=docker://23c9ea6767a274f8e8759da39dee90f403d9d28b1fec97c1fa4cd8746b41c8c3, > exitCode=1, finishedAt=Time(time=2019-04-19T09:40:48Z, > additionalProperties={}), message=null, reason=Error, signal=null, > startedAt=Time(time=2019-04-19T09:39:57Z, additionalProperties={}), > additionalProperties={}), waiting=null, additionalProperties={}), > additionalProperties={})] > 19/04/19 09:40:51 INFO LineBufferedStream: stdout: 2019-04-19 09:40:51 INFO > LoggingPodStatusWatcherImpl:54 - Container final statuses: > > > > Please let me know if I miss anything. Any help appreciated. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org