[ https://issues.apache.org/jira/browse/SPARK-19307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146280#comment-16146280 ]
Charlie Tsai edited comment on SPARK-19307 at 8/29/17 10:43 PM: ---------------------------------------------------------------- Hi, I am using 2.2.0 but find that command line {{--conf}} arguments are still not available when the {{SparkConf()}} object is instantiated. As a result, I can't check what has already been set using the command line {{--conf}} arguments in my driver and set additional configuration using {{setIfMissing}}. Instead, {{setIfMissing}} effectively overwrites whatever is passed in through the CLI. For example, if my job is: {code} # debug.py import pyspark if __name__ == '__main__': print(pyspark.SparkConf()._jconf) # is `None` but should include `--conf` arguments default_conf = { "spark.dynamicAllocation.maxExecutors": "36", "spark.yarn.executor.memoryOverhead": "1500", } # these are supposed to be set only if not provided by the CLI args spark_conf = pyspark.SparkConf() for (k, v) in default_conf.items(): spark_conf.setIfMissing(k, v) {code} Running {code} spark-submit \ --master yarn \ --deploy-mode client \ --conf spark.yarn.executor.memoryOverhead=2500 \ --conf spark.dynamicAllocation.maxExecutors=128 \ debug.py {code} In 1.6.2 the CLI args take precedent, whereas in 2.2.0, {{SparkConf().getAll()}} appears empty even though {{--conf}} args were passed in already. was (Author: ctsai): Hi, I am using 2.2.0 but find that command line {{--conf}} arguments are still not available when the {{SparkConf()}} object is instantiated. As a result, I can't check what has already been set using the command line {{--conf}} arguments in my driver and set additional configuration using {{setIfMissing}}. Instead, {{setIfMissing}} effectively overwrites whatever is passed in through the CLI. For example, if my job is: {code} # debug.py import pyspark if __name__ == '__main__': print(pyspark.SparkConf()._jconf) # is `None` but should include `--conf` arguments default_conf = { "spark.dynamicAllocation.maxExecutors": "36", "spark.yarn.executor.memoryOverhead": "1500", } # these are suppsoed to be set only if not provided by the CLI args spark_conf = pyspark.SparkConf() for (k, v) in default_conf.items(): spark_conf.setIfMissing(k, v) {code} Running {code} spark-submit \ --master yarn \ --deploy-mode client \ --conf spark.yarn.executor.memoryOverhead=2500 \ --conf spark.dynamicAllocation.maxExecutors=128 \ debug.py {code} In 1.6.2 the CLI args take precedent, whereas in 2.2.0, {{SparkConf().getAll()}} appears empty even though {{--conf}} args were passed in already. > SPARK-17387 caused ignorance of conf object passed to SparkContext: > ------------------------------------------------------------------- > > Key: SPARK-19307 > URL: https://issues.apache.org/jira/browse/SPARK-19307 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.1.0 > Reporter: yuriy_hupalo > Assignee: Marcelo Vanzin > Fix For: 2.1.1, 2.2.0 > > Attachments: SPARK-19307.patch > > > after patch SPARK-17387 was applied -- Sparkconf object is ignored when > launching SparkContext programmatically via python from spark-submit: > https://github.com/apache/spark/blob/master/python/pyspark/context.py#L128: > in case when we are running python SparkContext(conf=xxx) from spark-submit: > conf is set, conf._jconf is None () > passed as arg conf object is ignored (and used only when we are > launching java_gateway). > how to fix: > python/pyspark/context.py:132 > {code:title=python/pyspark/context.py:132} > if conf is not None and conf._jconf is not None: > # conf has been initialized in JVM properly, so use conf > directly. This represent the > # scenario that JVM has been launched before SparkConf is created > (e.g. SparkContext is > # created and then stopped, and we create a new SparkConf and new > SparkContext again) > self._conf = conf > else: > self._conf = SparkConf(_jvm=SparkContext._jvm) > + if conf: > + for key, value in conf.getAll(): > + self._conf.set(key,value) > + print(key,value) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org