Github user mccheah commented on the issue: https://github.com/apache/spark/pull/19954 > I disagree. For example, just to pick one example I spotted while glancing at the code quickly. There are many steps that take a kubernetesResourceNamePrefix as a constructor argument. Then the code that instantiates those steps needs to know each of those steps, and that they take that prefix as an argument to their constructor. I don't see that as abstracting much. > Instead, for example, if you encapsulate the state of a "kubernetes submission" into some helper type that wraps SparkConf and other common things needed to submit an app, those steps could potentially just take that helper type as an argument to the abstract method that does the processing (e.g. configureDriver). There's less step-specific logic needed everywhere, and the list of arguments to constructors and methods shrinks a lot. @vanzin An excellent idea - this is very much in line with the e.g. `SQLConf` that's in Spark-SQL. @liyinan926 it would be great to introduce this and swap out our parameters in various places. Our original methodology was to pass parameters down that would normally need to be repeatedly retrieved by the same config key from the `SparkConf` over and over again, but we can make an abstraction for that. This basically moves us from parameter primitives to parameter objects, an elegant solution. > That's the thing. That's not an abstraction, that's just code that lives in a separate method. You can test the "method that creates a list of steps" separately too without having to stash it in a completely separate type and create more coupling in your code. Abstraction is meant to reduce coupling, not increase it ... > ... You just have code living in separate classes with the goal of making testing easier, but you can achieve the same thing without the code living in separate classes, and things would probably be simpler overall. A design we aimed for here was to make every class in the code do exactly one thing, and the test for that class thus can focus solely on testing that single thing. Having the `@VisibleForTesting` annotation with public methods or something similar to test multiple methods in the same class makes that single unit test class difficult to read. Or we would have to split up the unit test for the given class to multiple unit test classes, but then it becomes difficult to track down all of the classes that come together to test that class. So although I agree that we're not creating abstractions per se in the sense that we're not creating generic APIs that have multiple implementations, we are creating helper objects and submodules to decompose the submission process such that we have smaller chunks to test at a time. Therefore a key hallmark for why the code is decomposed the way it is, is to make the tests as readable as possible, that each test class is honed in on covering all code paths in a single unit. Again - open to ideas of how to make that more readable in the main code. > An abstraction for an orchestrator would mean that there is a code path that can take a different orchestrator to achieve different things. For example, if you had code that started containers, and given a different orchestrator, could start either a driver or an executor, or even be used by tests to mock some behavior you want to verify. I'm interested in exploring this idea - will have to think a bit more on what that would specifically look like. > I think the biggest thing that makes reviewing this PR challenging is the way the code is structured, not the amount of changes. This PR is actually both shorter and conceptually simpler than the first two big ones, IMO. Unless we do a thorough refactoring of the code, I personally don't think splitting into smaller ones will help much on reducing the confusion. Splitting also risks making half baked feature into 2.3, which I personally don't think is a good idea. @liyinan926 the code structure is the main issue, yes, but we can tackle the code structure more effectively by having a better decomposition of the review process as well. We have these three distinct components which are relatively independent. We can therefore separate out the three pieces and consider the architecture for each of them individually. Whereas here we may want to tackle how to change the submission and the executor pod configuration, but what if in our focus on those we miss crucial issues in the design of the init container? Breaking up the PR will allow us to make each PR focus on the architectural considerations of each part more thoroughly, and it makes it easier to follow the discussion surrounding each part.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org