Re: Kubernetes: why use init containers?

Matt Cheah Wed, 10 Jan 2018 13:34:08 -0800

If we use spark-submit in client mode from the driver container, how do we 
handle needing to switch between a cluster-mode scheduler backend and a 
client-mode scheduler backend in the future?


Something else re: client mode accessibility – if we make client mode 
accessible to users even if it’s behind a flag, that’s a very different 
contract from needing to recompile spark-submit to support client mode. The 
amount of effort required from the user to get to client mode is very different 
between the two cases, and the contract is much clearer when client mode is 
forbidden in all circumstances, versus client mode being allowed with a 
specific flag. If we’re saying that we don’t support client mode, we should 
bias towards making client mode as difficult as possible to access, i.e. 
impossible with a standard Spark distribution.

-Matt Cheah

On 1/10/18, 1:24 PM, "Marcelo Vanzin" <van...@cloudera.com> wrote:

    On Wed, Jan 10, 2018 at 1:10 PM, Matt Cheah <mch...@palantir.com> wrote:
    > I’d imagine this is a reason why YARN hasn’t went with using spark-submit 
from the application master...
    
    I wouldn't use YARN as a template to follow when writing a new
    backend. A lot of the reason why the YARN backend works the way it
    does is because of backwards compatibility. IMO it would be much
    better to change the YARN backend to use spark-submit, because it
    would immensely simplify the code there. It was a nightmare to get
    YARN to reach feature parity with other backends because it has to
    pretty much reimplement everything.
    
    But doing that would break pretty much every Spark-on-YARN deployment,
    so it's not something we can do right now.
    
    For the other backends the situation is sort of similar; it probably
    wouldn't be hard to change standalone's DriverWrapper to also use
    spark-submit. But that brings potential side effects for existing
    users that don't exist with spark-on-k8s, because spark-on-k8s is new
    (the current fork aside).
    
    >  But using init-containers makes it such that we don’t need to use 
spark-submit at all
    
    Those are actually separate concerns. There are a whole bunch of
    things that spark-submit provides you that you'd have to replicate in
    the k8s backend if not using it. Thinks like properly handling special
    characters in arguments, native library paths, "userClassPathFirst",
    etc. You get them almost for free with spark-submit, and using an init
    container does not solve any of those for you.
    
    I'd say that using spark-submit is really not up for discussion here;
    it saves you from re-implementing a whole bunch of code that you
    shouldn't even be trying to re-implement.
    
    Separately, if there is a legitimate need for an init container, then
    it can be added. But I don't see that legitimate need right now, so I
    don't see what it's bringing other than complexity.
    
    (And no, "the k8s documentation mentions that init containers are
    sometimes used to download dependencies" is not a legitimate need.)
    
    -- 
    Marcelo
    
    ---------------------------------------------------------------------
    To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

smime.p7s
Description: S/MIME cryptographic signature

Re: Kubernetes: why use init containers?

Reply via email to