Github user vanzin commented on the issue: https://github.com/apache/spark/pull/16695 Actually this can be considered kind of a continuation of SPARK-2669. The issue this is trying to solve is that the NMs don't necessarily have a client Hadoop configuration in place. (Or maybe I'm mistaken and that's a requirement?) Or, in case I'm wrong and they do, they might be different from the gateway's. So the Spark app is started and it's using the configuration as defined by the user, and now on the actual NM a bunch of files that the user configuration expects to exist on certain paths do not. What do we do? Hadoop's configuration is not very flexible here... if it had a concept of "these paths are inside the config directory" or something like that this wouldn't be needed. So this is the solution I thought about. Or maybe a better approach is to try to be smarter and, instead of blindly loading the user's config, try to load the NM's config first, and then overlay the user's config on top of it. Then, for example, "final" configs in the NM's files wouldn't be overwritten by the user. How does that one sound? I'm not a super big fan of the latter approach because then Spark exposes different functionality in client and cluster mode. In client mode the user has full control of the Hadoop configs, and in cluster mode they wouldn't anymore. But maybe it's an ok compromise.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org