If we use spark-submit in client mode from the driver container, how do we handle needing to switch between a cluster-mode scheduler backend and a client-mode scheduler backend in the future?
Something else re: client mode accessibility – if we make client mode accessible to users even if it’s behind a flag, that’s a very different contract from needing to recompile spark-submit to support client mode. The amount of effort required from the user to get to client mode is very different between the two cases, and the contract is much clearer when client mode is forbidden in all circumstances, versus client mode being allowed with a specific flag. If we’re saying that we don’t support client mode, we should bias towards making client mode as difficult as possible to access, i.e. impossible with a standard Spark distribution. -Matt Cheah On 1/10/18, 1:24 PM, "Marcelo Vanzin" <van...@cloudera.com> wrote: On Wed, Jan 10, 2018 at 1:10 PM, Matt Cheah <mch...@palantir.com> wrote: > I’d imagine this is a reason why YARN hasn’t went with using spark-submit from the application master... I wouldn't use YARN as a template to follow when writing a new backend. A lot of the reason why the YARN backend works the way it does is because of backwards compatibility. IMO it would be much better to change the YARN backend to use spark-submit, because it would immensely simplify the code there. It was a nightmare to get YARN to reach feature parity with other backends because it has to pretty much reimplement everything. But doing that would break pretty much every Spark-on-YARN deployment, so it's not something we can do right now. For the other backends the situation is sort of similar; it probably wouldn't be hard to change standalone's DriverWrapper to also use spark-submit. But that brings potential side effects for existing users that don't exist with spark-on-k8s, because spark-on-k8s is new (the current fork aside). > But using init-containers makes it such that we don’t need to use spark-submit at all Those are actually separate concerns. There are a whole bunch of things that spark-submit provides you that you'd have to replicate in the k8s backend if not using it. Thinks like properly handling special characters in arguments, native library paths, "userClassPathFirst", etc. You get them almost for free with spark-submit, and using an init container does not solve any of those for you. I'd say that using spark-submit is really not up for discussion here; it saves you from re-implementing a whole bunch of code that you shouldn't even be trying to re-implement. Separately, if there is a legitimate need for an init container, then it can be added. But I don't see that legitimate need right now, so I don't see what it's bringing other than complexity. (And no, "the k8s documentation mentions that init containers are sometimes used to download dependencies" is not a legitimate need.) -- Marcelo --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
smime.p7s
Description: S/MIME cryptographic signature