Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/16695
  
    Actually this can be considered kind of a continuation of SPARK-2669.
    
    The issue this is trying to solve is that the NMs don't necessarily have a 
client Hadoop configuration in place. (Or maybe I'm mistaken and that's a 
requirement?) Or, in case I'm wrong and they do, they might be different from 
the gateway's.
    
    So the Spark app is started and it's using the configuration as defined by 
the user, and now on the actual NM a bunch of files that the user configuration 
expects to exist on certain paths do not. What do we do?
    
    Hadoop's configuration is not very flexible here... if it had a concept of 
"these paths are inside the config directory" or something like that this 
wouldn't be needed. So this is the solution I thought about.
    
    Or maybe a better approach is to try to be smarter and, instead of blindly 
loading the user's config, try to load the NM's config first, and then overlay 
the user's config on top of it. Then, for example, "final" configs in the NM's 
files wouldn't be overwritten by the user. How does that one sound?
    
    I'm not a super big fan of the latter approach because then Spark exposes 
different functionality in client and cluster mode. In client mode the user has 
full control of the Hadoop configs, and in cluster mode they wouldn't anymore. 
But maybe it's an ok compromise.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to