[jira] [Comment Edited] (SPARK-20369) pyspark: Dynamic configuration with SparkConf does not work

Hyukjin Kwon (JIRA) Tue, 25 Apr 2017 00:24:30 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-20369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982489#comment-15982489
 ]


Hyukjin Kwon edited comment on SPARK-20369 at 4/25/17 7:24 AM:
---------------------------------------------------------------

It looks I can't reproduce this as below:

{code}
from pyspark import SparkContext, SparkConf

conf = SparkConf().setAppName("spark-conf-test") \
    .setMaster("local[2]") \
    .set('spark.python.worker.memory',"1g") \
    .set('spark.executor.memory',"3g") \
    .set("spark.driver.maxResultSize","2g")

print
print "Spark Config values in SparkConf:"
print conf.toDebugString()

sc = SparkContext(conf=conf)
print
print "Actual Spark Config values:"
print sc.getConf().toDebugString()

print conf.get("spark.python.worker.memory") == 
sc.getConf().get("spark.python.worker.memory")
print conf.get("spark.executor.memory") == 
sc.getConf().get("spark.executor.memory")
print conf.get("spark.driver.maxResultSize") == 
sc.getConf().get("spark.driver.maxResultSize")
{code}

{code}
Spark Config values in SparkConf:
spark.master=local[2]
spark.executor.memory=3g
spark.python.worker.memory=1g
spark.app.name=spark-conf-test
spark.driver.maxResultSize=2g
...

Actual Spark Config values:
...
spark.driver.maxResultSize=2g
spark.app.name=spark-conf-test
spark.executor.memory=3g
spark.master=local[2]
spark.python.worker.memory=1g
...
True
True
True
{code}

Are you able to check this in the current master maybe?


was (Author: hyukjin.kwon):
It looks I can't reproduce this as below:

{code}
from pyspark import SparkContext, SparkConf

conf = SparkConf().setAppName("spark-conf-test") \
    .setMaster("local[2]") \
    .set('spark.python.worker.memory',"1g") \
    .set('spark.executor.memory',"3g") \
    .set("spark.driver.maxResultSize","2g")

print
print "Spark Config values in SparkConf:"
print conf.toDebugString()

sc = SparkContext(conf=conf)
print
print "Actual Spark Config values:"
print sc.getConf().toDebugString()

print conf.get("spark.python.worker.memory") == 
sc.getConf().get("spark.python.worker.memory")
print conf.get("spark.executor.memory") == 
sc.getConf().get("spark.executor.memory")
print conf.get("spark.driver.maxResultSize") == 
sc.getConf().get("spark.driver.maxResultSize")
{code}

{code}
Spark Config values in SparkConf:
spark.master=local[2]
spark.executor.memory=3g
spark.python.worker.memory=1g
spark.app.name=spark-conf-test
spark.driver.maxResultSize=2g
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
17/04/25 16:20:08 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable

Actual Spark Config values:
spark.app.id=local-1493104809510
spark.app.name=spark-conf-test
spark.driver.extraClassPath=/Users/hyukjinkwon/Desktop/workspace/local/forked/spark-xml/target/scala-2.11/spark-xml_2.11-0.4.0.jar
spark.driver.host=192.168.15.168
spark.driver.maxResultSize=2g
spark.driver.port=56783
spark.executor.extraClassPath=/Users/hyukjinkwon/Desktop/workspace/local/forked/spark-xml/target/scala-2.11/spark-xml_2.11-0.4.0.jar
spark.executor.id=driver
spark.executor.memory=3g
spark.master=local[2]
spark.python.worker.memory=1g
spark.rdd.compress=True
spark.serializer.objectStreamReset=100
spark.submit.deployMode=client
True
True
True
{code}

Are you able to check this in the current master maybe?

> pyspark: Dynamic configuration with SparkConf does not work
> -----------------------------------------------------------
>
>                 Key: SPARK-20369
>                 URL: https://issues.apache.org/jira/browse/SPARK-20369
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.1.0
>         Environment: Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-40-generic x86_64) 
> and Mac OS X 10.11.6
>            Reporter: Matthew McClain
>            Priority: Minor
>
> Setting spark properties dynamically in pyspark using SparkConf object does 
> not work. Here is the code that shows the bug:
> ---
> from pyspark import SparkContext, SparkConf
> def main():
>     conf = SparkConf().setAppName("spark-conf-test") \
>         .setMaster("local[2]") \
>         .set('spark.python.worker.memory',"1g") \
>         .set('spark.executor.memory',"3g") \
>         .set("spark.driver.maxResultSize","2g")
>     print "Spark Config values in SparkConf:"
>     print conf.toDebugString()
>     sc = SparkContext(conf=conf)
>     print "Actual Spark Config values:"
>     print sc.getConf().toDebugString()
> if __name__  == "__main__":
>     main()
> ---
> Here is the output; none of the config values set in SparkConf are used in 
> the SparkContext configuration:
> Spark Config values in SparkConf:
> spark.master=local[2]
> spark.executor.memory=3g
> spark.python.worker.memory=1g
> spark.app.name=spark-conf-test
> spark.driver.maxResultSize=2g
> 17/04/18 10:21:24 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Actual Spark Config values:
> spark.app.id=local-1492528885708
> spark.app.name=sandbox.py
> spark.driver.host=10.201.26.172
> spark.driver.maxResultSize=4g
> spark.driver.port=54657
> spark.executor.id=driver
> spark.files=file:/Users/matt.mcclain/dev/datascience-experiments/mmcclain/client_clusters/sandbox.py
> spark.master=local[*]
> spark.rdd.compress=True
> spark.serializer.objectStreamReset=100
> spark.submit.deployMode=client



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-20369) pyspark: Dynamic configuration with SparkConf does not work

Reply via email to