[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...

vanzin Tue, 19 Dec 2017 13:06:55 -0800

Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/19954
  
    > I think regardless of the abstractions we choose, we're eventually going 
to end up with many classes with unique constructors simply due to the 
complexity of the submission client itself.
    
    I disagree. For example, just to pick one example I spotted while glancing 
at the code quickly. There are many steps that take a 
`kubernetesResourceNamePrefix` as a constructor argument. Then the code that 
instantiates those steps needs to know each of those steps, and that they take 
that prefix as an argument to their constructor. I don't see that as 
abstracting much.
    
    Instead, for example, if you encapsulate the state of a "kubernetes 
submission" into some helper type that wraps SparkConf and other common things 
needed to submit an app, those steps could potentially just take that helper 
type as an argument to the abstract method that does the processing (e.g. 
`configureDriver`). There's less step-specific logic needed everywhere, and the 
list of arguments to constructors and methods shrinks a lot.
    
    So, for example, bad abstraction:
    
    ```
    val step1 = new Step1(/* lots of arguments */)
    val step2 = new Step2(/* lots of arguments, many repeated with step 1 */)
    val step3 = new Step3(/* lots of arguments, many repeated with step 1 */)
    val steps = Seq(step1, step2, step3)
    
    val something = steps.something { s => s.doStep(someObject) }
    val client = new ClientBuilder(/* yet another long list of things with 
repeated arguments */)
      .createClient(/* even more arguments */)
    ```
    
    Better abstraction:
    
    ```
    val steps = Seq(new Step1(), new Step2(), new Step3())
    
    val finalState = steps.something { s => 
s.doStep(someObjectThatTracksAllSubmissionState) }
    val client = new Client(finalState)
    ```
    
    You may not be able to clean up everything in the argument list, but you 
can get into a much better place. The current code is just full of repetition 
that could easily be avoided.
    
    > As with any abstraction, we can inline the orchestrator into the 
submission client, and then consider if that makes the Client do too many 
things, and also, what would the unit tests look like as a result? I like that 
we can test the orchestrator and its selection of which steps to apply 
independently
    
    That's the thing. That's not an abstraction, that's just code that lives in 
a separate method. You can test the "method that creates a list of steps" 
separately too without having to stash it in a completely separate type and 
create more coupling in your code. Abstraction is meant to reduce coupling, not 
increase it.
    
    An abstraction for an orchestrator would mean that there is a code path 
that can take a different orchestrator to achieve different things. For 
example, if you had code that started containers, and given a different 
orchestrator, could start either a driver or an executor, or even be used by 
tests to mock some behavior you want to verify. But that's not what you have. 
You just have code living in separate classes with the goal of making testing 
easier, but you can achieve the same thing without the code living in separate 
classes, and things would probably be simpler overall.
    
    The same argument can actually be made about steps. You could potentially 
have one big class that has each step as a separate method, and one method that 
calls all of the needed methods in the right order (that would be the 
"orchestrator" method). You could test each "step" individually because they 
wouldn't be calling each other. (Now I'm not against the "step" abstraction per 
se, I just think there's a lot of room for improvement in the current code.)



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...

Reply via email to