Github user mccheah commented on the issue: https://github.com/apache/spark/pull/19954 > It's nice that there is some documentation somewhere, but that documentation doesn't really seem to address my comments. For one example, it only explicitly talks about the driver - which sort of makes sense because the document is about submission. But why aren't orchestrators used when starting executors too? It seems there's similar code baked into another class instead. > What I'm asking is for this to be documented properly so that someone who didn't write the code has enough information to know that it's working as it should. Right now I don't see what some of these abstractions are for at all - for example, as far as I can see, the orchestrator can be replaced by a method call instead of being a completely separate type; it's not really abstracting anything. Look at where it's used: @vanzin Thanks for this feedback. I might suggest that documentation isn't the main issue, but rather that it's possible the abstractions themselves are not self-documenting. We're open to ideas for alternative representations of the submission logic that are easier to parse for the outside reader, and this dialogue is very productive. > But why aren't orchestrators used when starting executors too? It seems there's similar code baked into another class instead. As with any abstraction, we can inline the orchestrator into the submission client, and then consider if that makes the `Client` do too many things, and also, what would the unit tests look like as a result? I like that we can test the orchestrator and its selection of which steps to apply independently from testing that the `Client` takes the API objects and creates them accordingly, particularly with the hooked up owner references. A goal we had with this design is that every class has as few responsibilities as possible; ideally every module is responsible for exactly one thing. Though the long constructor argument list for the orchestrator would suggest that the orchestrator is tied to the Client pretty tightly. > But why aren't orchestrators used when starting executors too? It seems there's similar code baked into another class instead. A common situation one will see is that the submission client needs to do strictly more work than the driver does, as there is more work required to set up a Spark application and create the objects for it, than it is to bootstrap the executor pods with the objects that already exist. For example, the submission client has to create the config map that sets up the init-container, as well as mount the config map volume into the driver; but the driver does not create the config map but instead uses the pre-existing one, but still has to do the same mounting into the executor pod as the submission client does. These semantics are shown where we first have a submodule that attaches the config map volumes to an arbitrary pod (the âbootstrapâ), then, the driver need only use that submodule in isolation, but the submission client wraps that submodule in a step that both uses that submodule and also creates the API objects in the API server. It would be neat to explore the idea of using a step-based system that is shared for creating the executors and creating the driver, but we have to think carefully about a proper abstraction. For example, the `KubernetesDriverSpec` has the notion of `additionalKubernetesResources` and the `driverSparkConf`, which are constructs that do not apply when creating the executor pods. This illustrates what we observe above - the construction of the application is strictly more work than creating the executors. So then, does the submission client have two orchestrators - one for configuring the driver pod itself, and one for configuring everything else in the application, e.g. creating config maps? Then the orchestration and step selection for the pod-configuration-steps could be shared between the submission client and the driver, but the orchestrator for strictly creating Kubernetes API objects and the driver spark conf would only be used by the submission client. There are some direct ions we can explore here, and we are open to ideas. > So aside from not really being able to infer the structure of how these things work, the current abstraction seems to be creating a lot of constructors and methods with long lists of arguments, which is another thing that hurts the readability of the code. I think regardless of the abstractions we choose, we're eventually going to end up with many classes with unique constructors simply due to the complexity of the submission client itself. Creating a Spark application in Kubernetes has many steps, and we want to put each step into a submodule - regardless if the submodules have a shared trait like we have now, or if the submodules are just independent classes with independent APIs. Each submodule will have an argument list that is befitting of the properties that submodule will require to configure the driver. See https://github.com/apache-spark-on-k8s/spark/tree/branch-2.2-kubernetes/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/submitsteps for all the different submodules the submission client will eventually need. I do think that we should remove the `InitContainer` level of steps and orchestration, as having two layers of steps and orchestration is pretty confusing. Thereâs only one code branch in the init container steps orchestrator which we only apply when we are also submitting local files - and that isnât even done in this PR - so it shouldnât be too big of a burden to inline that logic all into the top level step when the time comes. Again, the feedback is very much appreciated. What are some ways we can improve and move forward here?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org