GitHub user zjffdu opened a pull request:

    https://github.com/apache/zeppelin/pull/1446

    ZEPPELIN-1263. Should specify zeppelin's spark configuration through --conf 
arguments of spark-submit

    ### What is this PR for?
    
    For now we spark configuration at runtime rather than pass them through 
`--conf`, it would cause several issues.
    * Some configuration has to be set through --conf, otherwise we need to 
duplicate code in SparkSubmit.scala (spark.yarn.keytab, spark.yarn.principal)
    * Some configuration would conflict with spark-defaults.conf. If you 
specify spark.master as yarn-client in spark-defaults.conf but specify 
spark.master as local in zeppelin side, you will see the spark interpreter fail 
to start due to this inconsistency. 
    * As ZEPPELIN-1460 described, it is hard to figure what is the effective 
configuration. 
    * We can not use yarn-cluster mode although it is not supported now, but I 
think it is necessary to do that as zeppelin needs to support multiple users.
    
    So this PR would pass all the spark related configuration to spark-submit 
through `--conf`, so that it is easy to know and guarantee that configuration 
on zeppelin interpreter setting take precedence over spark-defaults.conf.  And 
it is also good for maintenance that upstream change (any change about 
configuration in spark) would not affect us. 
    
    
    ### What type of PR is it?
    [Improvement]
    
    ### Todos
    * [ ] - Task
    
    ### What is the Jira issue?
    * https://issues.apache.org/jira/browse/ZEPPELIN-1263
    
    ### How should this be tested?
    Tested spark 1.6 spark 2.0 on both yarn-client mode and embedded mode. 
    
    ### Screenshots (if appropriate)
    
![image](https://cloud.githubusercontent.com/assets/164491/18702212/3e7b54d0-8013-11e6-95f7-502b3cf89d67.png)
    
    ### Questions:
    * Does the licenses files need update? No
    * Is there breaking changes for older versions? No
    * Does this needs documentation? No
    
    …

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zjffdu/zeppelin ZEPPELIN-1263

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/1446.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1446
    
----
commit e36dfc1eb7cf06df4acb717a9701bf36f7a0afd5
Author: Jeff Zhang <zjf...@apache.org>
Date:   2016-08-03T04:50:04Z

    ZEPPELIN-1263. Should specify zeppelin's spark configuration through --conf 
arguments of spark-submit

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to