[ https://issues.apache.org/jira/browse/SPARK-24028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444902#comment-16444902 ]
Anirudh Ramanathan edited comment on SPARK-24028 at 4/19/18 10:22 PM: ---------------------------------------------------------------------- My suspicion here is that this has to do with timing. An easy way to check may be to add a sleep() of a few seconds during driver pod startup and seeing if the issue resolves itself. Looks like there may have been a race condition with the storage mounting logic in the past, but if you're seeing this fresh in 1.9.4, that is something we should file a bug about in upstream Kubernetes. All the recent runs of https://k8s-testgrid.appspot.com/sig-big-data#spark-periodic-latest-gke on v1.9.6 have been green. Any ideas on how we can reproduce this? was (Author: foxish): My suspicion here is that this has to do with timing. An easy way to check may be to add a sleep() of a few seconds during driver pod startup and seeing if the issue resolves itself. Looks like there may have been a race condition with the storage mounting logic in the past, but if you're seeing this fresh in 1.9.4, that is something we should file a bug about in upstream. All the recent runs of https://k8s-testgrid.appspot.com/sig-big-data#spark-periodic-latest-gke on v1.9.6 have been green. Any ideas on how we can reproduce this? > [K8s] Creating secrets and config maps before creating the driver pod has > unpredictable behavior > ------------------------------------------------------------------------------------------------ > > Key: SPARK-24028 > URL: https://issues.apache.org/jira/browse/SPARK-24028 > Project: Spark > Issue Type: Bug > Components: Kubernetes > Affects Versions: 2.3.0 > Reporter: Matt Cheah > Priority: Critical > > Currently we create the Kubernetes resources the driver depends on - such as > the properties config map and secrets to mount into the pod - only after we > create the driver pod. This is because we want these extra objects to > immediately have an owner reference to be tied to the driver pod. > On our Kubernetes 1.9.4. cluster, we're seeing that sometimes this works > fine, but other times the driver ends up being started with empty volumes > instead of volumes with the contents of the secrets we expect. The result is > that sometimes the driver will start without these files mounted, which leads > to various failures if the driver requires these files to be present early on > in their code. Missing the properties file config map, for example, would > mean spark-submit doesn't have a properties file to read at all. See the > warning on [https://kubernetes.io/docs/concepts/storage/volumes/#secret.] > Unfortunately we cannot link owner references to non-existent objects, so we > have to do this instead: > # Create the auxiliary resources without any owner references. > # Create the driver pod mounting these resources into volumes, as before. > # If #2 fails, clean up the resources created in #1. > # Edit the auxiliary resources to have an owner reference for the driver pod. > The multi-step approach leaves a small chance for us to leak resources - for > example, if we fail to make the resource edits in #4 for some reason. This > also changes the permissioning mode required for spark-submit - credentials > provided to spark-submit need to be able to edit resources in addition to > creating them. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org