One thing I forgot in my previous e-mail is that if a resource is remote I'm pretty sure (but haven't double checked the code) that executors will download it directly from the remote server, and not from the driver. So there, distributed download without an init container.
On Tue, Jan 9, 2018 at 7:15 PM, Yinan Li <liyinan...@gmail.com> wrote: > The init-container is required for use with the resource staging server > (https://github.com/apache-spark-on-k8s/userdocs/blob/master/src/jekyll/running-on-kubernetes.md#resource-staging-server). If the staging server *requires* an init container you have already a design problem right there. > Additionally, the init-container is a Kubernetes > native way of making sure that the dependencies are localized Sorry, but the init container does not do anything by itself. You had to add a whole bunch of code to execute the existing Spark code in an init container, when not doing it would have achieved the exact same goal much more easily, in a way that is consistent with how Spark already does things. Matt: > the executors wouldn’t receive the jars on their class loader until after the > executor starts I actually consider that a benefit. It means spark-on-k8s application will behave more like all the other backends, where that is true also (application jars live in a separate class loader). > traditionally meant to prepare the environment for the application that is to > be run You guys are forcing this argument when it all depends on where you draw the line. Spark can be launched without downloading any of those dependencies, because Spark will download them for you. Forcing the "kubernetes way" just means you're writing a lot more code, and breaking the Spark app initialization into multiple container invocations, to achieve the same thing. > would make the SparkSubmit code inadvertently allow running client mode > Kubernetes applications as well Not necessarily. I have that in my patch; it doesn't allow client mode unless a property that only the cluster mode submission code sets is present. If some user wants to hack their way around that, more power to them; users can also compile their own Spark without the checks if they want to try out client mode in some way. Anirudh: > Telling users that they must rebuild images ... every time seems less than > convincing to me. Sure, I'm not proposing people use the docker image approach all the time. It would be a hassle while developing an app, as it is kind of a hassle today where the code doesn't upload local files to the k8s cluster. But it's perfectly reasonable for people to optimize a production app by bundling the app into a pre-built docker image to avoid re-downloading resources every time. Like they'd probably place the jar + dependencies on HDFS today with YARN, to get the benefits of the YARN cache. -- Marcelo --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org