Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19954 > I think regardless of the abstractions we choose, we're eventually going to end up with many classes with unique constructors simply due to the complexity of the submission client itself. I disagree. For example, just to pick one example I spotted while glancing at the code quickly. There are many steps that take a `kubernetesResourceNamePrefix` as a constructor argument. Then the code that instantiates those steps needs to know each of those steps, and that they take that prefix as an argument to their constructor. I don't see that as abstracting much. Instead, for example, if you encapsulate the state of a "kubernetes submission" into some helper type that wraps SparkConf and other common things needed to submit an app, those steps could potentially just take that helper type as an argument to the abstract method that does the processing (e.g. `configureDriver`). There's less step-specific logic needed everywhere, and the list of arguments to constructors and methods shrinks a lot. So, for example, bad abstraction: ``` val step1 = new Step1(/* lots of arguments */) val step2 = new Step2(/* lots of arguments, many repeated with step 1 */) val step3 = new Step3(/* lots of arguments, many repeated with step 1 */) val steps = Seq(step1, step2, step3) val something = steps.something { s => s.doStep(someObject) } val client = new ClientBuilder(/* yet another long list of things with repeated arguments */) .createClient(/* even more arguments */) ``` Better abstraction: ``` val steps = Seq(new Step1(), new Step2(), new Step3()) val finalState = steps.something { s => s.doStep(someObjectThatTracksAllSubmissionState) } val client = new Client(finalState) ``` You may not be able to clean up everything in the argument list, but you can get into a much better place. The current code is just full of repetition that could easily be avoided. > As with any abstraction, we can inline the orchestrator into the submission client, and then consider if that makes the Client do too many things, and also, what would the unit tests look like as a result? I like that we can test the orchestrator and its selection of which steps to apply independently That's the thing. That's not an abstraction, that's just code that lives in a separate method. You can test the "method that creates a list of steps" separately too without having to stash it in a completely separate type and create more coupling in your code. Abstraction is meant to reduce coupling, not increase it. An abstraction for an orchestrator would mean that there is a code path that can take a different orchestrator to achieve different things. For example, if you had code that started containers, and given a different orchestrator, could start either a driver or an executor, or even be used by tests to mock some behavior you want to verify. But that's not what you have. You just have code living in separate classes with the goal of making testing easier, but you can achieve the same thing without the code living in separate classes, and things would probably be simpler overall. The same argument can actually be made about steps. You could potentially have one big class that has each step as a separate method, and one method that calls all of the needed methods in the right order (that would be the "orchestrator" method). You could test each "step" individually because they wouldn't be calling each other. (Now I'm not against the "step" abstraction per se, I just think there's a lot of room for improvement in the current code.)
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org