imtzer commented on PR #45911:
URL: https://github.com/apache/spark/pull/45911#issuecomment-2129285181

   > > same problem when using spark operator, it's weird why the code does not 
throw anything when configmap is not created
   > 
   > When using spark-submit, there is no error output in the console, and the 
client will show that the driver pod is always in the ContainerCreating state 
and will never end.
   
   Maybe I met another issuse, I use spark-submit in spark operator pod with 
k8s mode, but sometimes driver pod keeps getting stuck in ContainerCreating 
state due to missing ConfigMap and the console output shows 'Killed'. I added 
some log in KubernetesClientApplication.scala like this:
   ```
       logInfo("before pod create, " + driverPodName)
   
       var watch: Watch = null
       var createdDriverPod: Pod = null
       try {
         createdDriverPod =
           
kubernetesClient.pods().inNamespace(conf.namespace).resource(resolvedDriverPod).create()
       } catch {...}
   
       logInfo("before pre resource refresh, " + driverPodName)
   
       // Refresh all pre-resources' owner references
       try {
         addOwnerReference(createdDriverPod, preKubernetesResources)
         kubernetesClient.resourceList(preKubernetesResources: 
_*).forceConflicts().serverSideApply()
       } catch {...}
   
       logInfo("before other resource, " + driverPodName)
   
       // setup resources after pod creation, and refresh all resources' owner 
references
       try {
         val otherKubernetesResources = 
resolvedDriverSpec.driverKubernetesResources ++ Seq(configMap)
         addOwnerReference(createdDriverPod, otherKubernetesResources)
         kubernetesClient.resourceList(otherKubernetesResources: 
_*).createOrReplace()
         logInfo("after other resource, " + driverPodName)
   
       } catch {...}
   ```
   and the log did not show `after other resource` when spark-submit was 
'Killed', the configmap that driver pod need had not been successfully created!
   Also, I checked oom using `dmesg`, and some process in spark operator pod 
was killed because of oom


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to