[ https://issues.apache.org/jira/browse/SPARK-24248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471280#comment-16471280 ]
Matt Cheah commented on SPARK-24248: ------------------------------------ I thought about it a bit more, and believe that we can do most if not all of our actions in the Watcher directly. In other words, can we drive the entire lifecycle of all executors solely from the perspective of watch events? This would be a pretty big rewrite, but you get a large number of benefits from this - namely, you remove any need for synchronization or local state management at all. {{KubernetesClusterSchedulerBackend}} becomes effectively stateless, apart from the parent class's fields. This would imply at least the following changes, though I'm sure I'm missing some: * When the watcher receives a modified or error event, check the status of the executor, construct the exit reason, and call \{{RemoveExecutor}} directly * The watcher keeps a running count of active executors and itself triggers rounds of creating new executors (instead of the periodic polling) * {\{KubernetesDriverEndpoint::onDisconnected}} is a tricky one. What I'm thinking is that we can just disable the executor but not remove it, counting on the Watch to receive an event that would actually trigger removing the executor. The idea here is that the status of the pods as reported by the Watch should be fully reliable - e.g. whenever any error occurs in the executor such that it becomes unusable, the Kubernetes API should report such state. We could perhaps make the API's representation of the world more accurate by attaching liveness probes to the executor pod. > [K8S] Use the Kubernetes cluster as the backing store for the state of pods > --------------------------------------------------------------------------- > > Key: SPARK-24248 > URL: https://issues.apache.org/jira/browse/SPARK-24248 > Project: Spark > Issue Type: Improvement > Components: Kubernetes > Affects Versions: 2.3.0 > Reporter: Matt Cheah > Priority: Major > > We have a number of places in KubernetesClusterSchedulerBackend right now > that maintains the state of pods in memory. However, the Kubernetes API can > always give us the most up to date and correct view of what our executors are > doing. We should consider moving away from in-memory state as much as can in > favor of using the Kubernetes cluster as the source of truth for pod status. > Maintaining less state in memory makes it so that there's a lower chance that > we accidentally miss updating one of these data structures and breaking the > lifecycle of executors. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org