[ https://issues.apache.org/jira/browse/SPARK-17387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-17387: ------------------------------------ Assignee: (was: Apache Spark) > Creating SparkContext() from python without spark-submit ignores user conf > -------------------------------------------------------------------------- > > Key: SPARK-17387 > URL: https://issues.apache.org/jira/browse/SPARK-17387 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.0.0 > Reporter: Marcelo Vanzin > Priority: Minor > > Consider the following scenario: user runs a python application not through > spark-submit, but by adding the pyspark module and manually creating a Spark > context. Kinda like this: > {noformat} > $ SPARK_HOME=$PWD PYTHONPATH=python:python/lib/py4j-0.10.3-src.zip python > Python 2.7.12 (default, Jul 1 2016, 15:12:24) > [GCC 5.4.0 20160609] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> from pyspark import SparkContext > >>> from pyspark import SparkConf > >>> conf = SparkConf().set("spark.driver.memory", "4g") > >>> sc = SparkContext(conf=conf) > {noformat} > If you look at the JVM launched by the pyspark code, it ignores the user's > configuration: > {noformat} > $ ps ax | grep $(pgrep -f SparkSubmit) > 12283 pts/2 Sl+ 0:03 /apps/java7/bin/java -cp ... -Xmx1g > -XX:MaxPermSize=256m org.apache.spark.deploy.SparkSubmit pyspark-shell > {noformat} > Note the "1g" of memory. If instead you use "pyspark", you get the correct > "4g" in the JVM. > This also affects other configs; for example, you can't really add jars to > the driver's classpath using "spark.jars". > You can work around this by setting the undocumented env variable Spark > itself uses: > {noformat} > $ SPARK_HOME=$PWD PYTHONPATH=python:python/lib/py4j-0.10.3-src.zip python > Python 2.7.12 (default, Jul 1 2016, 15:12:24) > [GCC 5.4.0 20160609] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import os > >>> os.environ['PYSPARK_SUBMIT_ARGS'] = "pyspark-shell --conf > >>> spark.driver.memory=4g" > >>> from pyspark import SparkContext > >>> sc = SparkContext() > {noformat} > But it would be nicer if the configs were automatically propagated. > BTW the reason for this is that the {{launch_gateway}} function used to start > the JVM does not take any parameters, and the only place where it reads > arguments for Spark is that env variable. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org