Re: replacement for SPARK_JAVA_OPTS
pts/200:00:06 /usr/local/java/bin/java -cp ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx -XX:MaxPermSize=128m -Dfoo.bar.baz=23 -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v --driver-java-options -Dfoo.bar.baz=23 --class org.apache.spark.repl.Main On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell pwend...@gmail.com wrote: Cody - in your example you are using the '=' character, but in our documentation and tests we use a whitespace to separate the key and value in the defaults file. docs: http://spark.apache.org/docs/latest/configuration.html spark.driver.extraJavaOptions -Dfoo.bar.baz=23 I'm not sure if the java properties file parser will try to interpret the equals sign. If so you might need to do this. spark.driver.extraJavaOptions -Dfoo.bar.baz=23 Do those work for you? On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin van...@cloudera.com wrote: Hi Cody, Could you file a bug for this if there isn't one already? For system properties SparkSubmit should be able to read those settings and do the right thing, but that obviously won't work for other JVM options... the current code should work fine in cluster mode though, since the driver is a different process. :-) On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger c...@koeninger.org wrote: We were previously using SPARK_JAVA_OPTS to set java system properties via -D. This was used for properties that varied on a per-deployment-environment basis, but needed to be available in the spark shell and workers. On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated, and replaced by spark-defaults.conf and command line arguments to spark-submit or spark-shell. However, setting spark.driver.extraJavaOptions and spark.executor.extraJavaOptions in spark-defaults.conf is not a replacement for SPARK_JAVA_OPTS: $ cat conf/spark-defaults.conf spark.driver.extraJavaOptions=-Dfoo.bar.baz=23 $ ./bin/spark-shell scala System.getProperty(foo.bar.baz) res0: String = null $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23 scala System.getProperty(foo.bar.baz) res0: String = 23 Looking through the shell scripts for spark-submit and spark-class, I can see why this is; parsing spark-defaults.conf from bash could be brittle. But from an ergonomic point of view, it's a step back to go from a set-it-and-forget-it configuration in spark-env.sh, to requiring command line arguments. I can solve this with an ad-hoc script to wrap spark-shell with the appropriate arguments, but I wanted to bring the issue up to see if anyone else had run into it, or had any direction for a general solution (beyond parsing java properties files from bash). -- Marcelo
Re: replacement for SPARK_JAVA_OPTS
514 5392 2058 0 21:15 pts/200:00:00 bash ./bin/spark-shell -v --driver-java-options -Dfoo.bar.baz=23 514 5399 5392 80 21:15 pts/200:00:06 /usr/local/java/bin/java -cp ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx -XX:MaxPermSize=128m -Dfoo.bar.baz=23 -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v --driver-java-options -Dfoo.bar.baz=23 --class org.apache.spark.repl.Main On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell pwend...@gmail.com wrote: Cody - in your example you are using the '=' character, but in our documentation and tests we use a whitespace to separate the key and value in the defaults file. docs: http://spark.apache.org/docs/latest/configuration.html spark.driver.extraJavaOptions -Dfoo.bar.baz=23 I'm not sure if the java properties file parser will try to interpret the equals sign. If so you might need to do this. spark.driver.extraJavaOptions -Dfoo.bar.baz=23 Do those work for you? On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin van...@cloudera.com wrote: Hi Cody, Could you file a bug for this if there isn't one already? For system properties SparkSubmit should be able to read those settings and do the right thing, but that obviously won't work for other JVM options... the current code should work fine in cluster mode though, since the driver is a different process. :-) On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger c...@koeninger.org wrote: We were previously using SPARK_JAVA_OPTS to set java system properties via -D. This was used for properties that varied on a per-deployment-environment basis, but needed to be available in the spark shell and workers. On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated, and replaced by spark-defaults.conf and command line arguments to spark-submit or spark-shell. However, setting spark.driver.extraJavaOptions and spark.executor.extraJavaOptions in spark-defaults.conf is not a replacement for SPARK_JAVA_OPTS: $ cat conf/spark-defaults.conf spark.driver.extraJavaOptions=-Dfoo.bar.baz=23 $ ./bin/spark-shell scala System.getProperty(foo.bar.baz) res0: String = null $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23 scala System.getProperty(foo.bar.baz) res0: String = 23 Looking through the shell scripts for spark-submit and spark-class, I can see why this is; parsing spark-defaults.conf from bash could be brittle. But from an ergonomic point of view, it's a step back to go from a set-it-and-forget-it configuration in spark-env.sh, to requiring command line arguments. I can solve this with an ad-hoc script to wrap spark-shell with the appropriate arguments, but I wanted to bring the issue up to see if anyone else had run into it, or had any direction for a general solution (beyond parsing java properties files from bash). -- Marcelo -- Marcelo - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: replacement for SPARK_JAVA_OPTS
-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v --class org.apache.spark.repl.Main Here's an example of it when the command line --driver-java-options is used (and thus things work): $ ps -ef | grep spark 514 5392 2058 0 21:15 pts/200:00:00 bash ./bin/spark-shell -v --driver-java-options -Dfoo.bar.baz=23 514 5399 5392 80 21:15 pts/200:00:06 /usr/local/java/bin/java -cp ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx -XX:MaxPermSize=128m -Dfoo.bar.baz=23 -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v --driver-java-options -Dfoo.bar.baz=23 --class org.apache.spark.repl.Main On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell pwend...@gmail.com wrote: Cody - in your example you are using the '=' character, but in our documentation and tests we use a whitespace to separate the key and value in the defaults file. docs: http://spark.apache.org/docs/latest/configuration.html spark.driver.extraJavaOptions -Dfoo.bar.baz=23 I'm not sure if the java properties file parser will try to interpret the equals sign. If so you might need to do this. spark.driver.extraJavaOptions -Dfoo.bar.baz=23 Do those work for you? On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin van...@cloudera.com wrote: Hi Cody, Could you file a bug for this if there isn't one already? For system properties SparkSubmit should be able to read those settings and do the right thing, but that obviously won't work for other JVM options... the current code should work fine in cluster mode though, since the driver is a different process. :-) On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger c...@koeninger.org wrote: We were previously using SPARK_JAVA_OPTS to set java system properties via -D. This was used for properties that varied on a per-deployment-environment basis, but needed to be available in the spark shell and workers. On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated, and replaced by spark-defaults.conf and command line arguments to spark-submit or spark-shell. However, setting spark.driver.extraJavaOptions and spark.executor.extraJavaOptions in spark-defaults.conf is not a replacement for SPARK_JAVA_OPTS: $ cat conf/spark-defaults.conf spark.driver.extraJavaOptions=-Dfoo.bar.baz=23 $ ./bin/spark-shell scala System.getProperty(foo.bar.baz) res0: String = null $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23 scala System.getProperty(foo.bar.baz) res0: String = 23 Looking through the shell scripts for spark-submit and spark-class, I can see why this is; parsing spark-defaults.conf from bash could be brittle. But from an ergonomic point of view, it's a step back to go from a set-it-and-forget-it configuration in spark-env.sh, to requiring command line arguments. I can solve this with an ad-hoc script to wrap spark-shell with the appropriate arguments, but I wanted to bring the issue up to see if anyone else had run into it, or had any direction for a general solution (beyond parsing java properties files from bash). -- Marcelo -- Marcelo - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: replacement for SPARK_JAVA_OPTS
variables, but nothing parses spark-defaults.conf before the java process is started. Here's an example of the process running when only spark-defaults.conf is being used: $ ps -ef | grep spark 514 5182 2058 0 21:05 pts/200:00:00 bash ./bin/spark-shell -v 514 5189 5182 4 21:05 pts/200:00:22 /usr/local/java/bin/java -cp ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v --class org.apache.spark.repl.Main Here's an example of it when the command line --driver-java-options is used (and thus things work): $ ps -ef | grep spark 514 5392 2058 0 21:15 pts/200:00:00 bash ./bin/spark-shell -v --driver-java-options -Dfoo.bar.baz=23 514 5399 5392 80 21:15 pts/200:00:06 /usr/local/java/bin/java -cp ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx -XX:MaxPermSize=128m -Dfoo.bar.baz=23 -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v --driver-java-options -Dfoo.bar.baz=23 --class org.apache.spark.repl.Main On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell pwend...@gmail.com wrote: Cody - in your example you are using the '=' character, but in our documentation and tests we use a whitespace to separate the key and value in the defaults file. docs: http://spark.apache.org/docs/latest/configuration.html spark.driver.extraJavaOptions -Dfoo.bar.baz=23 I'm not sure if the java properties file parser will try to interpret the equals sign. If so you might need to do this. spark.driver.extraJavaOptions -Dfoo.bar.baz=23 Do those work for you? On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin van...@cloudera.com wrote: Hi Cody, Could you file a bug for this if there isn't one already? For system properties SparkSubmit should be able to read those settings and do the right thing, but that obviously won't work for other JVM options... the current code should work fine in cluster mode though, since the driver is a different process. :-) On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger c...@koeninger.org wrote: We were previously using SPARK_JAVA_OPTS to set java system properties via -D. This was used for properties that varied on a per-deployment-environment basis, but needed to be available in the spark shell and workers. On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated, and replaced by spark-defaults.conf and command line arguments to spark-submit or spark-shell. However, setting spark.driver.extraJavaOptions and spark.executor.extraJavaOptions in spark-defaults.conf is not a replacement for SPARK_JAVA_OPTS: $ cat conf/spark-defaults.conf spark.driver.extraJavaOptions=-Dfoo.bar.baz=23 $ ./bin/spark-shell scala System.getProperty(foo.bar.baz) res0: String = null $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23 scala System.getProperty(foo.bar.baz) res0: String = 23 Looking through the shell scripts for spark-submit and spark-class, I can see why this is; parsing spark-defaults.conf from bash could be brittle. But from an ergonomic point of view, it's a step back to go from a set-it-and-forget-it configuration in spark-env.sh, to requiring command line arguments. I can solve this with an ad-hoc script to wrap spark-shell with the appropriate arguments, but I wanted to bring the issue up to see if anyone else had run into it, or had any direction for a general solution (beyond parsing java properties files from bash). -- Marcelo -- Marcelo - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: replacement for SPARK_JAVA_OPTS
spark.driver.extraJavaOptions -Dfoo.bar.baz=23 $ ./bin/spark-shell -v Using properties file: /opt/spark/conf/spark-defaults.conf Adding default property: spark.driver.extraJavaOptions=-Dfoo.bar.baz=23 scala System.getProperty(foo.bar.baz) res0: String = null Neither one of those affects the issue; the underlying problem in my case seems to be that bin/spark-class uses the SPARK_SUBMIT_OPTS and SPARK_JAVA_OPTS environment variables, but nothing parses spark-defaults.conf before the java process is started. Here's an example of the process running when only spark-defaults.conf is being used: $ ps -ef | grep spark 514 5182 2058 0 21:05 pts/200:00:00 bash ./bin/spark-shell -v 514 5189 5182 4 21:05 pts/200:00:22 /usr/local/java/bin/java -cp ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v --class org.apache.spark.repl.Main Here's an example of it when the command line --driver-java-options is used (and thus things work): $ ps -ef | grep spark 514 5392 2058 0 21:15 pts/200:00:00 bash ./bin/spark-shell -v --driver-java-options -Dfoo.bar.baz=23 514 5399 5392 80 21:15 pts/200:00:06 /usr/local/java/bin/java -cp ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx -XX:MaxPermSize=128m -Dfoo.bar.baz=23 -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v --driver-java-options -Dfoo.bar.baz=23 --class org.apache.spark.repl.Main On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell pwend...@gmail.com wrote: Cody - in your example you are using the '=' character, but in our documentation and tests we use a whitespace to separate the key and value in the defaults file. docs: http://spark.apache.org/docs/latest/configuration.html spark.driver.extraJavaOptions -Dfoo.bar.baz=23 I'm not sure if the java properties file parser will try to interpret the equals sign. If so you might need to do this. spark.driver.extraJavaOptions -Dfoo.bar.baz=23 Do those work for you? On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin van...@cloudera.com wrote: Hi Cody, Could you file a bug for this if there isn't one already? For system properties SparkSubmit should be able to read those settings and do the right thing, but that obviously won't work for other JVM options... the current code should work fine in cluster mode though, since the driver is a different process. :-) On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger c...@koeninger.org wrote: We were previously using SPARK_JAVA_OPTS to set java system properties via -D. This was used for properties that varied on a per-deployment-environment basis, but needed to be available in the spark shell and workers. On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated, and replaced by spark-defaults.conf and command line arguments to spark-submit or spark-shell. However, setting spark.driver.extraJavaOptions and spark.executor.extraJavaOptions in spark-defaults.conf is not a replacement for SPARK_JAVA_OPTS: $ cat conf/spark-defaults.conf spark.driver.extraJavaOptions=-Dfoo.bar.baz=23 $ ./bin/spark-shell scala System.getProperty(foo.bar.baz) res0: String = null $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23 scala System.getProperty(foo.bar.baz) res0: String = 23 Looking through the shell scripts for spark-submit and spark-class, I can see why this is; parsing spark-defaults.conf from bash could be brittle. But from an ergonomic point of view, it's a step back to go from a set-it-and-forget-it configuration in spark-env.sh, to requiring command line arguments. I can solve this with an ad-hoc script to wrap spark-shell with the appropriate arguments, but I wanted to bring the issue up to see if anyone else had run into it, or had any direction for a general solution (beyond parsing java properties files from bash). -- Marcelo -- Marcelo - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: replacement for SPARK_JAVA_OPTS
: String = null If you add double quotes, the resulting string value will have double quotes. $ cat conf/spark-defaults.conf spark.driver.extraJavaOptions -Dfoo.bar.baz=23 $ ./bin/spark-shell -v Using properties file: /opt/spark/conf/spark-defaults.conf Adding default property: spark.driver.extraJavaOptions=-Dfoo.bar.baz=23 scala System.getProperty(foo.bar.baz) res0: String = null Neither one of those affects the issue; the underlying problem in my case seems to be that bin/spark-class uses the SPARK_SUBMIT_OPTS and SPARK_JAVA_OPTS environment variables, but nothing parses spark-defaults.conf before the java process is started. Here's an example of the process running when only spark-defaults.conf is being used: $ ps -ef | grep spark 514 5182 2058 0 21:05 pts/200:00:00 bash ./bin/spark-shell -v 514 5189 5182 4 21:05 pts/200:00:22 /usr/local/java/bin/java -cp ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v --class org.apache.spark.repl.Main Here's an example of it when the command line --driver-java-options is used (and thus things work): $ ps -ef | grep spark 514 5392 2058 0 21:15 pts/200:00:00 bash ./bin/spark-shell -v --driver-java-options -Dfoo.bar.baz=23 514 5399 5392 80 21:15 pts/200:00:06 /usr/local/java/bin/java -cp ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx -XX:MaxPermSize=128m -Dfoo.bar.baz=23 -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v --driver-java-options -Dfoo.bar.baz=23 --class org.apache.spark.repl.Main On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell pwend...@gmail.com wrote: Cody - in your example you are using the '=' character, but in our documentation and tests we use a whitespace to separate the key and value in the defaults file. docs: http://spark.apache.org/docs/latest/configuration.html spark.driver.extraJavaOptions -Dfoo.bar.baz=23 I'm not sure if the java properties file parser will try to interpret the equals sign. If so you might need to do this. spark.driver.extraJavaOptions -Dfoo.bar.baz=23 Do those work for you? On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin van...@cloudera.com wrote: Hi Cody, Could you file a bug for this if there isn't one already? For system properties SparkSubmit should be able to read those settings and do the right thing, but that obviously won't work for other JVM options... the current code should work fine in cluster mode though, since the driver is a different process. :-) On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger c...@koeninger.org wrote: We were previously using SPARK_JAVA_OPTS to set java system properties via -D. This was used for properties that varied on a per-deployment-environment basis, but needed to be available in the spark shell and workers. On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated, and replaced by spark-defaults.conf and command line arguments to spark-submit or spark-shell. However, setting spark.driver.extraJavaOptions and spark.executor.extraJavaOptions in spark-defaults.conf is not a replacement for SPARK_JAVA_OPTS: $ cat conf/spark-defaults.conf spark.driver.extraJavaOptions=-Dfoo.bar.baz=23 $ ./bin/spark-shell scala System.getProperty(foo.bar.baz) res0: String = null $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23 scala System.getProperty(foo.bar.baz) res0: String = 23 Looking through the shell scripts for spark-submit and spark-class, I can see why this is; parsing spark-defaults.conf from bash could be brittle. But from an ergonomic point of view, it's a step back to go from a set-it-and-forget-it configuration in spark-env.sh, to requiring command line arguments. I can solve this with an ad-hoc script to wrap spark-shell with the appropriate arguments, but I wanted to bring the issue up to see if anyone else had run into it, or had any direction for a general solution (beyond parsing java properties files from bash). -- Marcelo -- Marcelo
Re: replacement for SPARK_JAVA_OPTS
an example: $ cat conf/spark-defaults.conf spark.driver.extraJavaOptions -Dfoo.bar.baz=23 $ ./bin/spark-shell -v Using properties file: /opt/spark/conf/spark-defaults.conf Adding default property: spark.driver.extraJavaOptions=-Dfoo.bar.baz=23 scala System.getProperty(foo.bar.baz) res0: String = null If you add double quotes, the resulting string value will have double quotes. $ cat conf/spark-defaults.conf spark.driver.extraJavaOptions -Dfoo.bar.baz=23 $ ./bin/spark-shell -v Using properties file: /opt/spark/conf/spark-defaults.conf Adding default property: spark.driver.extraJavaOptions=-Dfoo.bar.baz=23 scala System.getProperty(foo.bar.baz) res0: String = null Neither one of those affects the issue; the underlying problem in my case seems to be that bin/spark-class uses the SPARK_SUBMIT_OPTS and SPARK_JAVA_OPTS environment variables, but nothing parses spark-defaults.conf before the java process is started. Here's an example of the process running when only spark-defaults.conf is being used: $ ps -ef | grep spark 514 5182 2058 0 21:05 pts/200:00:00 bash ./bin/spark-shell -v 514 5189 5182 4 21:05 pts/200:00:22 /usr/local/java/bin/java -cp ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v --class org.apache.spark.repl.Main Here's an example of it when the command line --driver-java-options is used (and thus things work): $ ps -ef | grep spark 514 5392 2058 0 21:15 pts/200:00:00 bash ./bin/spark-shell -v --driver-java-options -Dfoo.bar.baz=23 514 5399 5392 80 21:15 pts/200:00:06 /usr/local/java/bin/java -cp ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx -XX:MaxPermSize=128m -Dfoo.bar.baz=23 -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v --driver-java-options -Dfoo.bar.baz=23 --class org.apache.spark.repl.Main On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell pwend...@gmail.com wrote: Cody - in your example you are using the '=' character, but in our documentation and tests we use a whitespace to separate the key and value in the defaults file. docs: http://spark.apache.org/docs/latest/configuration.html spark.driver.extraJavaOptions -Dfoo.bar.baz=23 I'm not sure if the java properties file parser will try to interpret the equals sign. If so you might need to do this. spark.driver.extraJavaOptions -Dfoo.bar.baz=23 Do those work for you? On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin van...@cloudera.com wrote: Hi Cody, Could you file a bug for this if there isn't one already? For system properties SparkSubmit should be able to read those settings and do the right thing, but that obviously won't work for other JVM options... the current code should work fine in cluster mode though, since the driver is a different process. :-) On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger c...@koeninger.org wrote: We were previously using SPARK_JAVA_OPTS to set java system properties via -D. This was used for properties that varied on a per-deployment-environment basis, but needed to be available in the spark shell and workers. On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated, and replaced by spark-defaults.conf and command line arguments to spark-submit or spark-shell. However, setting spark.driver.extraJavaOptions and spark.executor.extraJavaOptions in spark-defaults.conf is not a replacement for SPARK_JAVA_OPTS: $ cat conf/spark-defaults.conf spark.driver.extraJavaOptions=-Dfoo.bar.baz=23 $ ./bin/spark-shell scala System.getProperty(foo.bar.baz) res0: String = null $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23 scala System.getProperty(foo.bar.baz) res0: String = 23 Looking through the shell scripts for spark-submit and spark-class, I can see why this is; parsing spark-defaults.conf from bash could be brittle. But from an ergonomic point of view, it's a step back to go from a set-it-and-forget-it configuration
Re: replacement for SPARK_JAVA_OPTS
the right thing, but that obviously won't work for other JVM options... the current code should work fine in cluster mode though, since the driver is a different process. :-) On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger c...@koeninger.org wrote: We were previously using SPARK_JAVA_OPTS to set java system properties via -D. This was used for properties that varied on a per-deployment-environment basis, but needed to be available in the spark shell and workers. On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated, and replaced by spark-defaults.conf and command line arguments to spark-submit or spark-shell. However, setting spark.driver.extraJavaOptions and spark.executor.extraJavaOptions in spark-defaults.conf is not a replacement for SPARK_JAVA_OPTS: $ cat conf/spark-defaults.conf spark.driver.extraJavaOptions=-Dfoo.bar.baz=23 $ ./bin/spark-shell scala System.getProperty(foo.bar.baz) res0: String = null $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23 scala System.getProperty(foo.bar.baz) res0: String = 23 Looking through the shell scripts for spark-submit and spark-class, I can see why this is; parsing spark-defaults.conf from bash could be brittle. But from an ergonomic point of view, it's a step back to go from a set-it-and-forget-it configuration in spark-env.sh, to requiring command line arguments. I can solve this with an ad-hoc script to wrap spark-shell with the appropriate arguments, but I wanted to bring the issue up to see if anyone else had run into it, or had any direction for a general solution (beyond parsing java properties files from bash). -- Marcelo
Re: replacement for SPARK_JAVA_OPTS
. If so you might need to do this. spark.driver.extraJavaOptions -Dfoo.bar.baz=23 Do those work for you? On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin van...@cloudera.com wrote: Hi Cody, Could you file a bug for this if there isn't one already? For system properties SparkSubmit should be able to read those settings and do the right thing, but that obviously won't work for other JVM options... the current code should work fine in cluster mode though, since the driver is a different process. :-) On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger c...@koeninger.org wrote: We were previously using SPARK_JAVA_OPTS to set java system properties via -D. This was used for properties that varied on a per-deployment-environment basis, but needed to be available in the spark shell and workers. On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated, and replaced by spark-defaults.conf and command line arguments to spark-submit or spark-shell. However, setting spark.driver.extraJavaOptions and spark.executor.extraJavaOptions in spark-defaults.conf is not a replacement for SPARK_JAVA_OPTS: $ cat conf/spark-defaults.conf spark.driver.extraJavaOptions=-Dfoo.bar.baz=23 $ ./bin/spark-shell scala System.getProperty(foo.bar.baz) res0: String = null $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23 scala System.getProperty(foo.bar.baz) res0: String = 23 Looking through the shell scripts for spark-submit and spark-class, I can see why this is; parsing spark-defaults.conf from bash could be brittle. But from an ergonomic point of view, it's a step back to go from a set-it-and-forget-it configuration in spark-env.sh, to requiring command line arguments. I can solve this with an ad-hoc script to wrap spark-shell with the appropriate arguments, but I wanted to bring the issue up to see if anyone else had run into it, or had any direction for a general solution (beyond parsing java properties files from bash). -- Marcelo
Re: replacement for SPARK_JAVA_OPTS
-Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v --driver-java-options -Dfoo.bar.baz=23 --class org.apache.spark.repl.Main On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell pwend...@gmail.com wrote: Cody - in your example you are using the '=' character, but in our documentation and tests we use a whitespace to separate the key and value in the defaults file. docs: http://spark.apache.org/docs/latest/configuration.html spark.driver.extraJavaOptions -Dfoo.bar.baz=23 I'm not sure if the java properties file parser will try to interpret the equals sign. If so you might need to do this. spark.driver.extraJavaOptions -Dfoo.bar.baz=23 Do those work for you? On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin van...@cloudera.com wrote: Hi Cody, Could you file a bug for this if there isn't one already? For system properties SparkSubmit should be able to read those settings and do the right thing, but that obviously won't work for other JVM options... the current code should work fine in cluster mode though, since the driver is a different process. :-) On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger c...@koeninger.org wrote: We were previously using SPARK_JAVA_OPTS to set java system properties via -D. This was used for properties that varied on a per-deployment-environment basis, but needed to be available in the spark shell and workers. On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated, and replaced by spark-defaults.conf and command line arguments to spark-submit or spark-shell. However, setting spark.driver.extraJavaOptions and spark.executor.extraJavaOptions in spark-defaults.conf is not a replacement for SPARK_JAVA_OPTS: $ cat conf/spark-defaults.conf spark.driver.extraJavaOptions=-Dfoo.bar.baz=23 $ ./bin/spark-shell scala System.getProperty(foo.bar.baz) res0: String = null $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23 scala System.getProperty(foo.bar.baz) res0: String = 23 Looking through the shell scripts for spark-submit and spark-class, I can see why this is; parsing spark-defaults.conf from bash could be brittle. But from an ergonomic point of view, it's a step back to go from a set-it-and-forget-it configuration in spark-env.sh, to requiring command line arguments. I can solve this with an ad-hoc script to wrap spark-shell with the appropriate arguments, but I wanted to bring the issue up to see if anyone else had run into it, or had any direction for a general solution (beyond parsing java properties files from bash). -- Marcelo
replacement for SPARK_JAVA_OPTS
We were previously using SPARK_JAVA_OPTS to set java system properties via -D. This was used for properties that varied on a per-deployment-environment basis, but needed to be available in the spark shell and workers. On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated, and replaced by spark-defaults.conf and command line arguments to spark-submit or spark-shell. However, setting spark.driver.extraJavaOptions and spark.executor.extraJavaOptions in spark-defaults.conf is not a replacement for SPARK_JAVA_OPTS: $ cat conf/spark-defaults.conf spark.driver.extraJavaOptions=-Dfoo.bar.baz=23 $ ./bin/spark-shell scala System.getProperty(foo.bar.baz) res0: String = null $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23 scala System.getProperty(foo.bar.baz) res0: String = 23 Looking through the shell scripts for spark-submit and spark-class, I can see why this is; parsing spark-defaults.conf from bash could be brittle. But from an ergonomic point of view, it's a step back to go from a set-it-and-forget-it configuration in spark-env.sh, to requiring command line arguments. I can solve this with an ad-hoc script to wrap spark-shell with the appropriate arguments, but I wanted to bring the issue up to see if anyone else had run into it, or had any direction for a general solution (beyond parsing java properties files from bash).
Re: replacement for SPARK_JAVA_OPTS
Hi Cody, Could you file a bug for this if there isn't one already? For system properties SparkSubmit should be able to read those settings and do the right thing, but that obviously won't work for other JVM options... the current code should work fine in cluster mode though, since the driver is a different process. :-) On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger c...@koeninger.org wrote: We were previously using SPARK_JAVA_OPTS to set java system properties via -D. This was used for properties that varied on a per-deployment-environment basis, but needed to be available in the spark shell and workers. On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated, and replaced by spark-defaults.conf and command line arguments to spark-submit or spark-shell. However, setting spark.driver.extraJavaOptions and spark.executor.extraJavaOptions in spark-defaults.conf is not a replacement for SPARK_JAVA_OPTS: $ cat conf/spark-defaults.conf spark.driver.extraJavaOptions=-Dfoo.bar.baz=23 $ ./bin/spark-shell scala System.getProperty(foo.bar.baz) res0: String = null $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23 scala System.getProperty(foo.bar.baz) res0: String = 23 Looking through the shell scripts for spark-submit and spark-class, I can see why this is; parsing spark-defaults.conf from bash could be brittle. But from an ergonomic point of view, it's a step back to go from a set-it-and-forget-it configuration in spark-env.sh, to requiring command line arguments. I can solve this with an ad-hoc script to wrap spark-shell with the appropriate arguments, but I wanted to bring the issue up to see if anyone else had run into it, or had any direction for a general solution (beyond parsing java properties files from bash). -- Marcelo
Re: replacement for SPARK_JAVA_OPTS
Cody - in your example you are using the '=' character, but in our documentation and tests we use a whitespace to separate the key and value in the defaults file. docs: http://spark.apache.org/docs/latest/configuration.html spark.driver.extraJavaOptions -Dfoo.bar.baz=23 I'm not sure if the java properties file parser will try to interpret the equals sign. If so you might need to do this. spark.driver.extraJavaOptions -Dfoo.bar.baz=23 Do those work for you? On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin van...@cloudera.com wrote: Hi Cody, Could you file a bug for this if there isn't one already? For system properties SparkSubmit should be able to read those settings and do the right thing, but that obviously won't work for other JVM options... the current code should work fine in cluster mode though, since the driver is a different process. :-) On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger c...@koeninger.org wrote: We were previously using SPARK_JAVA_OPTS to set java system properties via -D. This was used for properties that varied on a per-deployment-environment basis, but needed to be available in the spark shell and workers. On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated, and replaced by spark-defaults.conf and command line arguments to spark-submit or spark-shell. However, setting spark.driver.extraJavaOptions and spark.executor.extraJavaOptions in spark-defaults.conf is not a replacement for SPARK_JAVA_OPTS: $ cat conf/spark-defaults.conf spark.driver.extraJavaOptions=-Dfoo.bar.baz=23 $ ./bin/spark-shell scala System.getProperty(foo.bar.baz) res0: String = null $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23 scala System.getProperty(foo.bar.baz) res0: String = 23 Looking through the shell scripts for spark-submit and spark-class, I can see why this is; parsing spark-defaults.conf from bash could be brittle. But from an ergonomic point of view, it's a step back to go from a set-it-and-forget-it configuration in spark-env.sh, to requiring command line arguments. I can solve this with an ad-hoc script to wrap spark-shell with the appropriate arguments, but I wanted to bring the issue up to see if anyone else had run into it, or had any direction for a general solution (beyond parsing java properties files from bash). -- Marcelo
Re: replacement for SPARK_JAVA_OPTS
Either whitespace or equals sign are valid properties file formats. Here's an example: $ cat conf/spark-defaults.conf spark.driver.extraJavaOptions -Dfoo.bar.baz=23 $ ./bin/spark-shell -v Using properties file: /opt/spark/conf/spark-defaults.conf Adding default property: spark.driver.extraJavaOptions=-Dfoo.bar.baz=23 scala System.getProperty(foo.bar.baz) res0: String = null If you add double quotes, the resulting string value will have double quotes. $ cat conf/spark-defaults.conf spark.driver.extraJavaOptions -Dfoo.bar.baz=23 $ ./bin/spark-shell -v Using properties file: /opt/spark/conf/spark-defaults.conf Adding default property: spark.driver.extraJavaOptions=-Dfoo.bar.baz=23 scala System.getProperty(foo.bar.baz) res0: String = null Neither one of those affects the issue; the underlying problem in my case seems to be that bin/spark-class uses the SPARK_SUBMIT_OPTS and SPARK_JAVA_OPTS environment variables, but nothing parses spark-defaults.conf before the java process is started. Here's an example of the process running when only spark-defaults.conf is being used: $ ps -ef | grep spark 514 5182 2058 0 21:05 pts/200:00:00 bash ./bin/spark-shell -v 514 5189 5182 4 21:05 pts/200:00:22 /usr/local/java/bin/java -cp ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v --class org.apache.spark.repl.Main Here's an example of it when the command line --driver-java-options is used (and thus things work): $ ps -ef | grep spark 514 5392 2058 0 21:15 pts/200:00:00 bash ./bin/spark-shell -v --driver-java-options -Dfoo.bar.baz=23 514 5399 5392 80 21:15 pts/200:00:06 /usr/local/java/bin/java -cp ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx -XX:MaxPermSize=128m -Dfoo.bar.baz=23 -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v --driver-java-options -Dfoo.bar.baz=23 --class org.apache.spark.repl.Main On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell pwend...@gmail.com wrote: Cody - in your example you are using the '=' character, but in our documentation and tests we use a whitespace to separate the key and value in the defaults file. docs: http://spark.apache.org/docs/latest/configuration.html spark.driver.extraJavaOptions -Dfoo.bar.baz=23 I'm not sure if the java properties file parser will try to interpret the equals sign. If so you might need to do this. spark.driver.extraJavaOptions -Dfoo.bar.baz=23 Do those work for you? On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin van...@cloudera.com wrote: Hi Cody, Could you file a bug for this if there isn't one already? For system properties SparkSubmit should be able to read those settings and do the right thing, but that obviously won't work for other JVM options... the current code should work fine in cluster mode though, since the driver is a different process. :-) On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger c...@koeninger.org wrote: We were previously using SPARK_JAVA_OPTS to set java system properties via -D. This was used for properties that varied on a per-deployment-environment basis, but needed to be available in the spark shell and workers. On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated, and replaced by spark-defaults.conf and command line arguments to spark-submit or spark-shell. However, setting spark.driver.extraJavaOptions and spark.executor.extraJavaOptions in spark-defaults.conf is not a replacement for SPARK_JAVA_OPTS: $ cat conf/spark-defaults.conf spark.driver.extraJavaOptions=-Dfoo.bar.baz=23 $ ./bin/spark-shell scala System.getProperty(foo.bar.baz) res0: String = null $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23 scala System.getProperty(foo.bar.baz) res0: String = 23 Looking through the shell scripts for spark-submit and spark-class, I can see why this is; parsing spark-defaults.conf from bash could be brittle. But from an ergonomic point of view, it's a step back to go from a set-it-and-forget-it configuration in spark-env.sh, to requiring command line arguments. I can solve this with an ad-hoc script to wrap spark-shell with the appropriate arguments, but I wanted to bring the issue up to see if anyone else had run into it, or had any direction for a general solution (beyond parsing java properties files from bash). -- Marcelo
Re: replacement for SPARK_JAVA_OPTS
In addition, spark.executor.extraJavaOptions does not seem to behave as I would expect; java arguments don't seem to be propagated to executors. $ cat conf/spark-defaults.conf spark.master mesos://zk://etl-01.mxstg:2181,etl-02.mxstg:2181,etl-03.mxstg:2181/masters spark.executor.extraJavaOptions -Dfoo.bar.baz=23 spark.driver.extraJavaOptions -Dfoo.bar.baz=23 $ ./bin/spark-shell scala sc.getConf.get(spark.executor.extraJavaOptions) res0: String = -Dfoo.bar.baz=23 scala sc.parallelize(1 to 100).map{ i = ( | java.net.InetAddress.getLocalHost.getHostName, | System.getProperty(foo.bar.baz) | )}.collect res1: Array[(String, String)] = Array((dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-02.mxstg,null), (dn-02.mxstg,null), ... Note that this is a mesos deployment, although I wouldn't expect that to affect the availability of spark.driver.extraJavaOptions in a local spark shell. On Wed, Jul 30, 2014 at 4:18 PM, Cody Koeninger c...@koeninger.org wrote: Either whitespace or equals sign are valid properties file formats. Here's an example: $ cat conf/spark-defaults.conf spark.driver.extraJavaOptions -Dfoo.bar.baz=23 $ ./bin/spark-shell -v Using properties file: /opt/spark/conf/spark-defaults.conf Adding default property: spark.driver.extraJavaOptions=-Dfoo.bar.baz=23 scala System.getProperty(foo.bar.baz) res0: String = null If you add double quotes, the resulting string value will have double quotes. $ cat conf/spark-defaults.conf spark.driver.extraJavaOptions -Dfoo.bar.baz=23 $ ./bin/spark-shell -v Using properties file: /opt/spark/conf/spark-defaults.conf Adding default property: spark.driver.extraJavaOptions=-Dfoo.bar.baz=23 scala System.getProperty(foo.bar.baz) res0: String = null Neither one of those affects the issue; the underlying problem in my case seems to be that bin/spark-class uses the SPARK_SUBMIT_OPTS and SPARK_JAVA_OPTS environment variables, but nothing parses spark-defaults.conf before the java process is started. Here's an example of the process running when only spark-defaults.conf is being used: $ ps -ef | grep spark 514 5182 2058 0 21:05 pts/200:00:00 bash ./bin/spark-shell -v 514 5189 5182 4 21:05 pts/200:00:22 /usr/local/java/bin/java -cp ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v --class org.apache.spark.repl.Main Here's an example of it when the command line --driver-java-options is used (and thus things work): $ ps -ef | grep spark 514 5392 2058 0 21:15 pts/200:00:00 bash ./bin/spark-shell -v --driver-java-options -Dfoo.bar.baz=23 514 5399 5392 80 21:15 pts/200:00:06 /usr/local/java/bin/java -cp ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx -XX:MaxPermSize=128m -Dfoo.bar.baz=23 -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v --driver-java-options -Dfoo.bar.baz=23 --class org.apache.spark.repl.Main On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell pwend...@gmail.com wrote: Cody - in your example you are using the '=' character, but in our documentation and tests we use a whitespace to separate the key and value in the defaults file. docs: http://spark.apache.org/docs/latest/configuration.html spark.driver.extraJavaOptions -Dfoo.bar.baz=23 I'm not sure if the java properties file parser will try to interpret the equals sign. If so you might need to do this. spark.driver.extraJavaOptions -Dfoo.bar.baz=23 Do those work for you? On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin van...@cloudera.com wrote: Hi Cody, Could you file a bug for this if there isn't one already? For system properties SparkSubmit should be able to read those settings and do the right thing, but that obviously won't work for other JVM options... the current code should work fine in cluster mode though, since the driver is a different process. :-) On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger c...@koeninger.org wrote: We were previously using SPARK_JAVA_OPTS to set java system properties via -D. This was used for properties that varied on a per-deployment-environment basis, but needed to be available in the spark shell and workers. On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated, and replaced by spark-defaults.conf and command line arguments to spark-submit or spark-shell. However, setting spark.driver.extraJavaOptions and spark.executor.extraJavaOptions in spark-defaults.conf is not a replacement for SPARK_JAVA_OPTS