Re: replacement for SPARK_JAVA_OPTS

2014-08-07 Thread Cody Koeninger
 pts/200:00:06
 /usr/local/java/bin/java
  -cp
 
 ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx
  -XX:MaxPermSize=128m -Dfoo.bar.baz=23 -Djava.library.path= -Xms512m
  -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v
  --driver-java-options -Dfoo.bar.baz=23 --class
 org.apache.spark.repl.Main
 
 
 
 
  On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Cody - in your example you are using the '=' character, but in our
  documentation and tests we use a whitespace to separate the key and
  value in the defaults file.
 
  docs: http://spark.apache.org/docs/latest/configuration.html
 
  spark.driver.extraJavaOptions -Dfoo.bar.baz=23
 
  I'm not sure if the java properties file parser will try to interpret
  the equals sign. If so you might need to do this.
 
  spark.driver.extraJavaOptions -Dfoo.bar.baz=23
 
  Do those work for you?
 
  On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin van...@cloudera.com
 
  wrote:
   Hi Cody,
  
   Could you file a bug for this if there isn't one already?
  
   For system properties SparkSubmit should be able to read those
   settings and do the right thing, but that obviously won't work for
   other JVM options... the current code should work fine in cluster
 mode
   though, since the driver is a different process. :-)
  
  
   On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger 
 c...@koeninger.org
  wrote:
   We were previously using SPARK_JAVA_OPTS to set java system
 properties
  via
   -D.
  
   This was used for properties that varied on a
  per-deployment-environment
   basis, but needed to be available in the spark shell and workers.
  
   On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been
 deprecated,
  and
   replaced by spark-defaults.conf and command line arguments to
  spark-submit
   or spark-shell.
  
   However, setting spark.driver.extraJavaOptions and
   spark.executor.extraJavaOptions in spark-defaults.conf is not a
  replacement
   for SPARK_JAVA_OPTS:
  
  
   $ cat conf/spark-defaults.conf
   spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
  
   $ ./bin/spark-shell
  
   scala System.getProperty(foo.bar.baz)
   res0: String = null
  
  
   $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23
  
   scala System.getProperty(foo.bar.baz)
   res0: String = 23
  
  
   Looking through the shell scripts for spark-submit and
 spark-class, I
  can
   see why this is; parsing spark-defaults.conf from bash could be
  brittle.
  
   But from an ergonomic point of view, it's a step back to go from a
   set-it-and-forget-it configuration in spark-env.sh, to requiring
  command
   line arguments.
  
   I can solve this with an ad-hoc script to wrap spark-shell with
 the
   appropriate arguments, but I wanted to bring the issue up to see
 if
  anyone
   else had run into it,
   or had any direction for a general solution (beyond parsing java
  properties
   files from bash).
  
  
  
   --
   Marcelo
 
 
 





Re: replacement for SPARK_JAVA_OPTS

2014-08-07 Thread Marcelo Vanzin
  514   5392  2058  0 21:15 pts/200:00:00 bash
 ./bin/spark-shell -v
  --driver-java-options -Dfoo.bar.baz=23
 
  514   5399  5392 80 21:15 pts/200:00:06
 /usr/local/java/bin/java
  -cp
 
 ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx
  -XX:MaxPermSize=128m -Dfoo.bar.baz=23 -Djava.library.path= -Xms512m
  -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v
  --driver-java-options -Dfoo.bar.baz=23 --class
 org.apache.spark.repl.Main
 
 
 
 
  On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Cody - in your example you are using the '=' character, but in our
  documentation and tests we use a whitespace to separate the key and
  value in the defaults file.
 
  docs: http://spark.apache.org/docs/latest/configuration.html
 
  spark.driver.extraJavaOptions -Dfoo.bar.baz=23
 
  I'm not sure if the java properties file parser will try to interpret
  the equals sign. If so you might need to do this.
 
  spark.driver.extraJavaOptions -Dfoo.bar.baz=23
 
  Do those work for you?
 
  On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin van...@cloudera.com
 
  wrote:
   Hi Cody,
  
   Could you file a bug for this if there isn't one already?
  
   For system properties SparkSubmit should be able to read those
   settings and do the right thing, but that obviously won't work for
   other JVM options... the current code should work fine in cluster
 mode
   though, since the driver is a different process. :-)
  
  
   On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger 
 c...@koeninger.org
  wrote:
   We were previously using SPARK_JAVA_OPTS to set java system
 properties
  via
   -D.
  
   This was used for properties that varied on a
  per-deployment-environment
   basis, but needed to be available in the spark shell and workers.
  
   On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been
 deprecated,
  and
   replaced by spark-defaults.conf and command line arguments to
  spark-submit
   or spark-shell.
  
   However, setting spark.driver.extraJavaOptions and
   spark.executor.extraJavaOptions in spark-defaults.conf is not a
  replacement
   for SPARK_JAVA_OPTS:
  
  
   $ cat conf/spark-defaults.conf
   spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
  
   $ ./bin/spark-shell
  
   scala System.getProperty(foo.bar.baz)
   res0: String = null
  
  
   $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23
  
   scala System.getProperty(foo.bar.baz)
   res0: String = 23
  
  
   Looking through the shell scripts for spark-submit and
 spark-class, I
  can
   see why this is; parsing spark-defaults.conf from bash could be
  brittle.
  
   But from an ergonomic point of view, it's a step back to go from a
   set-it-and-forget-it configuration in spark-env.sh, to requiring
  command
   line arguments.
  
   I can solve this with an ad-hoc script to wrap spark-shell with
 the
   appropriate arguments, but I wanted to bring the issue up to see
 if
  anyone
   else had run into it,
   or had any direction for a general solution (beyond parsing java
  properties
   files from bash).
  
  
  
   --
   Marcelo
 
 
 






-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: replacement for SPARK_JAVA_OPTS

2014-08-07 Thread Gary Malouf
-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx
   -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
   org.apache.spark.deploy.SparkSubmit spark-shell -v --class
   org.apache.spark.repl.Main
  
  
   Here's an example of it when the command line
 --driver-java-options is
   used (and thus things work):
  
  
   $ ps -ef | grep spark
   514   5392  2058  0 21:15 pts/200:00:00 bash
  ./bin/spark-shell -v
   --driver-java-options -Dfoo.bar.baz=23
  
   514   5399  5392 80 21:15 pts/200:00:06
  /usr/local/java/bin/java
   -cp
  
 
 ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx
   -XX:MaxPermSize=128m -Dfoo.bar.baz=23 -Djava.library.path= -Xms512m
   -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v
   --driver-java-options -Dfoo.bar.baz=23 --class
  org.apache.spark.repl.Main
  
  
  
  
   On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell 
 pwend...@gmail.com
   wrote:
  
   Cody - in your example you are using the '=' character, but in our
   documentation and tests we use a whitespace to separate the key
 and
   value in the defaults file.
  
   docs: http://spark.apache.org/docs/latest/configuration.html
  
   spark.driver.extraJavaOptions -Dfoo.bar.baz=23
  
   I'm not sure if the java properties file parser will try to
 interpret
   the equals sign. If so you might need to do this.
  
   spark.driver.extraJavaOptions -Dfoo.bar.baz=23
  
   Do those work for you?
  
   On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin 
 van...@cloudera.com
  
   wrote:
Hi Cody,
   
Could you file a bug for this if there isn't one already?
   
For system properties SparkSubmit should be able to read those
settings and do the right thing, but that obviously won't work
 for
other JVM options... the current code should work fine in
 cluster
  mode
though, since the driver is a different process. :-)
   
   
On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger 
  c...@koeninger.org
   wrote:
We were previously using SPARK_JAVA_OPTS to set java system
  properties
   via
-D.
   
This was used for properties that varied on a
   per-deployment-environment
basis, but needed to be available in the spark shell and
 workers.
   
On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been
  deprecated,
   and
replaced by spark-defaults.conf and command line arguments to
   spark-submit
or spark-shell.
   
However, setting spark.driver.extraJavaOptions and
spark.executor.extraJavaOptions in spark-defaults.conf is not a
   replacement
for SPARK_JAVA_OPTS:
   
   
$ cat conf/spark-defaults.conf
spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
   
$ ./bin/spark-shell
   
scala System.getProperty(foo.bar.baz)
res0: String = null
   
   
$ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23
   
scala System.getProperty(foo.bar.baz)
res0: String = 23
   
   
Looking through the shell scripts for spark-submit and
  spark-class, I
   can
see why this is; parsing spark-defaults.conf from bash could be
   brittle.
   
But from an ergonomic point of view, it's a step back to go
 from a
set-it-and-forget-it configuration in spark-env.sh, to
 requiring
   command
line arguments.
   
I can solve this with an ad-hoc script to wrap spark-shell with
  the
appropriate arguments, but I wanted to bring the issue up to
 see
  if
   anyone
else had run into it,
or had any direction for a general solution (beyond parsing
 java
   properties
files from bash).
   
   
   
--
Marcelo
  
  
  
 
 
 



 --
 Marcelo

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: replacement for SPARK_JAVA_OPTS

2014-08-07 Thread Andrew Or
 variables, but nothing parses
spark-defaults.conf before the java process is started.
   
Here's an example of the process running when only
   spark-defaults.conf is
being used:
   
$ ps -ef | grep spark
   
514   5182  2058  0 21:05 pts/200:00:00 bash
   ./bin/spark-shell -v
   
514   5189  5182  4 21:05 pts/200:00:22
   /usr/local/java/bin/java
-cp
   
  
 
 ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx
-XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
org.apache.spark.deploy.SparkSubmit spark-shell -v --class
org.apache.spark.repl.Main
   
   
Here's an example of it when the command line
  --driver-java-options is
used (and thus things work):
   
   
$ ps -ef | grep spark
514   5392  2058  0 21:15 pts/200:00:00 bash
   ./bin/spark-shell -v
--driver-java-options -Dfoo.bar.baz=23
   
514   5399  5392 80 21:15 pts/200:00:06
   /usr/local/java/bin/java
-cp
   
  
 
 ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx
-XX:MaxPermSize=128m -Dfoo.bar.baz=23 -Djava.library.path=
 -Xms512m
-Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v
--driver-java-options -Dfoo.bar.baz=23 --class
   org.apache.spark.repl.Main
   
   
   
   
On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell 
  pwend...@gmail.com
wrote:
   
Cody - in your example you are using the '=' character, but in
 our
documentation and tests we use a whitespace to separate the key
  and
value in the defaults file.
   
docs: http://spark.apache.org/docs/latest/configuration.html
   
spark.driver.extraJavaOptions -Dfoo.bar.baz=23
   
I'm not sure if the java properties file parser will try to
  interpret
the equals sign. If so you might need to do this.
   
spark.driver.extraJavaOptions -Dfoo.bar.baz=23
   
Do those work for you?
   
On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin 
  van...@cloudera.com
   
wrote:
 Hi Cody,

 Could you file a bug for this if there isn't one already?

 For system properties SparkSubmit should be able to read those
 settings and do the right thing, but that obviously won't work
  for
 other JVM options... the current code should work fine in
  cluster
   mode
 though, since the driver is a different process. :-)


 On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger 
   c...@koeninger.org
wrote:
 We were previously using SPARK_JAVA_OPTS to set java system
   properties
via
 -D.

 This was used for properties that varied on a
per-deployment-environment
 basis, but needed to be available in the spark shell and
  workers.

 On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been
   deprecated,
and
 replaced by spark-defaults.conf and command line arguments to
spark-submit
 or spark-shell.

 However, setting spark.driver.extraJavaOptions and
 spark.executor.extraJavaOptions in spark-defaults.conf is
 not a
replacement
 for SPARK_JAVA_OPTS:


 $ cat conf/spark-defaults.conf
 spark.driver.extraJavaOptions=-Dfoo.bar.baz=23

 $ ./bin/spark-shell

 scala System.getProperty(foo.bar.baz)
 res0: String = null


 $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23

 scala System.getProperty(foo.bar.baz)
 res0: String = 23


 Looking through the shell scripts for spark-submit and
   spark-class, I
can
 see why this is; parsing spark-defaults.conf from bash could
 be
brittle.

 But from an ergonomic point of view, it's a step back to go
  from a
 set-it-and-forget-it configuration in spark-env.sh, to
  requiring
command
 line arguments.

 I can solve this with an ad-hoc script to wrap spark-shell
 with
   the
 appropriate arguments, but I wanted to bring the issue up to
  see
   if
anyone
 else had run into it,
 or had any direction for a general solution (beyond parsing
  java
properties
 files from bash).



 --
 Marcelo
   
   
   
  
  
  
 
 
 
  --
  Marcelo
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 



Re: replacement for SPARK_JAVA_OPTS

2014-08-07 Thread Andrew Or
spark.driver.extraJavaOptions -Dfoo.bar.baz=23
   
$ ./bin/spark-shell -v
Using properties file: /opt/spark/conf/spark-defaults.conf
Adding default property:
   spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
   
scala  System.getProperty(foo.bar.baz)
res0: String = null
   
   
Neither one of those affects the issue; the underlying problem
 in
  my
   case
seems to be that bin/spark-class uses the SPARK_SUBMIT_OPTS and
SPARK_JAVA_OPTS environment variables, but nothing parses
spark-defaults.conf before the java process is started.
   
Here's an example of the process running when only
   spark-defaults.conf is
being used:
   
$ ps -ef | grep spark
   
514   5182  2058  0 21:05 pts/200:00:00 bash
   ./bin/spark-shell -v
   
514   5189  5182  4 21:05 pts/200:00:22
   /usr/local/java/bin/java
-cp
   
  
 
 ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx
-XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
org.apache.spark.deploy.SparkSubmit spark-shell -v --class
org.apache.spark.repl.Main
   
   
Here's an example of it when the command line
  --driver-java-options is
used (and thus things work):
   
   
$ ps -ef | grep spark
514   5392  2058  0 21:15 pts/200:00:00 bash
   ./bin/spark-shell -v
--driver-java-options -Dfoo.bar.baz=23
   
514   5399  5392 80 21:15 pts/200:00:06
   /usr/local/java/bin/java
-cp
   
  
 
 ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx
-XX:MaxPermSize=128m -Dfoo.bar.baz=23 -Djava.library.path=
 -Xms512m
-Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v
--driver-java-options -Dfoo.bar.baz=23 --class
   org.apache.spark.repl.Main
   
   
   
   
On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell 
  pwend...@gmail.com
wrote:
   
Cody - in your example you are using the '=' character, but in
 our
documentation and tests we use a whitespace to separate the key
  and
value in the defaults file.
   
docs: http://spark.apache.org/docs/latest/configuration.html
   
spark.driver.extraJavaOptions -Dfoo.bar.baz=23
   
I'm not sure if the java properties file parser will try to
  interpret
the equals sign. If so you might need to do this.
   
spark.driver.extraJavaOptions -Dfoo.bar.baz=23
   
Do those work for you?
   
On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin 
  van...@cloudera.com
   
wrote:
 Hi Cody,

 Could you file a bug for this if there isn't one already?

 For system properties SparkSubmit should be able to read
 those
 settings and do the right thing, but that obviously won't
 work
  for
 other JVM options... the current code should work fine in
  cluster
   mode
 though, since the driver is a different process. :-)


 On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger 
   c...@koeninger.org
wrote:
 We were previously using SPARK_JAVA_OPTS to set java system
   properties
via
 -D.

 This was used for properties that varied on a
per-deployment-environment
 basis, but needed to be available in the spark shell and
  workers.

 On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been
   deprecated,
and
 replaced by spark-defaults.conf and command line arguments
 to
spark-submit
 or spark-shell.

 However, setting spark.driver.extraJavaOptions and
 spark.executor.extraJavaOptions in spark-defaults.conf is
 not a
replacement
 for SPARK_JAVA_OPTS:


 $ cat conf/spark-defaults.conf
 spark.driver.extraJavaOptions=-Dfoo.bar.baz=23

 $ ./bin/spark-shell

 scala System.getProperty(foo.bar.baz)
 res0: String = null


 $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23

 scala System.getProperty(foo.bar.baz)
 res0: String = 23


 Looking through the shell scripts for spark-submit and
   spark-class, I
can
 see why this is; parsing spark-defaults.conf from bash
 could be
brittle.

 But from an ergonomic point of view, it's a step back to go
  from a
 set-it-and-forget-it configuration in spark-env.sh, to
  requiring
command
 line arguments.

 I can solve this with an ad-hoc script to wrap spark-shell
 with
   the
 appropriate arguments, but I wanted to bring the issue up to
  see
   if
anyone
 else had run into it,
 or had any direction for a general solution (beyond parsing
  java
properties
 files from bash).



 --
 Marcelo
   
   
   
  
  
  
 
 
 
  --
  Marcelo
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 





Re: replacement for SPARK_JAVA_OPTS

2014-08-07 Thread Patrick Wendell
: String = null
   
   
If you add double quotes, the resulting string value will have
  double
quotes.
   
   
$ cat conf/spark-defaults.conf
spark.driver.extraJavaOptions -Dfoo.bar.baz=23
   
$ ./bin/spark-shell -v
Using properties file: /opt/spark/conf/spark-defaults.conf
Adding default property:
   spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
   
scala  System.getProperty(foo.bar.baz)
res0: String = null
   
   
Neither one of those affects the issue; the underlying problem
 in
  my
   case
seems to be that bin/spark-class uses the SPARK_SUBMIT_OPTS and
SPARK_JAVA_OPTS environment variables, but nothing parses
spark-defaults.conf before the java process is started.
   
Here's an example of the process running when only
   spark-defaults.conf is
being used:
   
$ ps -ef | grep spark
   
514   5182  2058  0 21:05 pts/200:00:00 bash
   ./bin/spark-shell -v
   
514   5189  5182  4 21:05 pts/200:00:22
   /usr/local/java/bin/java
-cp
   
  
 
 ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx
-XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
org.apache.spark.deploy.SparkSubmit spark-shell -v --class
org.apache.spark.repl.Main
   
   
Here's an example of it when the command line
  --driver-java-options is
used (and thus things work):
   
   
$ ps -ef | grep spark
514   5392  2058  0 21:15 pts/200:00:00 bash
   ./bin/spark-shell -v
--driver-java-options -Dfoo.bar.baz=23
   
514   5399  5392 80 21:15 pts/200:00:06
   /usr/local/java/bin/java
-cp
   
  
 
 ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx
-XX:MaxPermSize=128m -Dfoo.bar.baz=23 -Djava.library.path=
 -Xms512m
-Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v
--driver-java-options -Dfoo.bar.baz=23 --class
   org.apache.spark.repl.Main
   
   
   
   
On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell 
  pwend...@gmail.com
wrote:
   
Cody - in your example you are using the '=' character, but in
 our
documentation and tests we use a whitespace to separate the key
  and
value in the defaults file.
   
docs: http://spark.apache.org/docs/latest/configuration.html
   
spark.driver.extraJavaOptions -Dfoo.bar.baz=23
   
I'm not sure if the java properties file parser will try to
  interpret
the equals sign. If so you might need to do this.
   
spark.driver.extraJavaOptions -Dfoo.bar.baz=23
   
Do those work for you?
   
On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin 
  van...@cloudera.com
   
wrote:
 Hi Cody,

 Could you file a bug for this if there isn't one already?

 For system properties SparkSubmit should be able to read
 those
 settings and do the right thing, but that obviously won't
 work
  for
 other JVM options... the current code should work fine in
  cluster
   mode
 though, since the driver is a different process. :-)


 On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger 
   c...@koeninger.org
wrote:
 We were previously using SPARK_JAVA_OPTS to set java system
   properties
via
 -D.

 This was used for properties that varied on a
per-deployment-environment
 basis, but needed to be available in the spark shell and
  workers.

 On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been
   deprecated,
and
 replaced by spark-defaults.conf and command line arguments
 to
spark-submit
 or spark-shell.

 However, setting spark.driver.extraJavaOptions and
 spark.executor.extraJavaOptions in spark-defaults.conf is
 not a
replacement
 for SPARK_JAVA_OPTS:


 $ cat conf/spark-defaults.conf
 spark.driver.extraJavaOptions=-Dfoo.bar.baz=23

 $ ./bin/spark-shell

 scala System.getProperty(foo.bar.baz)
 res0: String = null


 $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23

 scala System.getProperty(foo.bar.baz)
 res0: String = 23


 Looking through the shell scripts for spark-submit and
   spark-class, I
can
 see why this is; parsing spark-defaults.conf from bash
 could be
brittle.

 But from an ergonomic point of view, it's a step back to go
  from a
 set-it-and-forget-it configuration in spark-env.sh, to
  requiring
command
 line arguments.

 I can solve this with an ad-hoc script to wrap spark-shell
 with
   the
 appropriate arguments, but I wanted to bring the issue up to
  see
   if
anyone
 else had run into it,
 or had any direction for a general solution (beyond parsing
  java
properties
 files from bash).



 --
 Marcelo
   
   
   
  
  
  
 
 
 
  --
  Marcelo

Re: replacement for SPARK_JAVA_OPTS

2014-08-07 Thread Andrew Or
 an example:

 $ cat conf/spark-defaults.conf
 spark.driver.extraJavaOptions -Dfoo.bar.baz=23

 $ ./bin/spark-shell -v
 Using properties file: /opt/spark/conf/spark-defaults.conf
 Adding default property:
spark.driver.extraJavaOptions=-Dfoo.bar.baz=23


 scala  System.getProperty(foo.bar.baz)
 res0: String = null


 If you add double quotes, the resulting string value will
 have
   double
 quotes.


 $ cat conf/spark-defaults.conf
 spark.driver.extraJavaOptions -Dfoo.bar.baz=23

 $ ./bin/spark-shell -v
 Using properties file: /opt/spark/conf/spark-defaults.conf
 Adding default property:
spark.driver.extraJavaOptions=-Dfoo.bar.baz=23

 scala  System.getProperty(foo.bar.baz)
 res0: String = null


 Neither one of those affects the issue; the underlying
 problem
  in
   my
case
 seems to be that bin/spark-class uses the SPARK_SUBMIT_OPTS
 and
 SPARK_JAVA_OPTS environment variables, but nothing parses
 spark-defaults.conf before the java process is started.

 Here's an example of the process running when only
spark-defaults.conf is
 being used:

 $ ps -ef | grep spark

 514   5182  2058  0 21:05 pts/200:00:00 bash
./bin/spark-shell -v

 514   5189  5182  4 21:05 pts/200:00:22
/usr/local/java/bin/java
 -cp

   
  
 
 ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx
 -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
 org.apache.spark.deploy.SparkSubmit spark-shell -v --class
 org.apache.spark.repl.Main


 Here's an example of it when the command line
   --driver-java-options is
 used (and thus things work):


 $ ps -ef | grep spark
 514   5392  2058  0 21:15 pts/200:00:00 bash
./bin/spark-shell -v
 --driver-java-options -Dfoo.bar.baz=23

 514   5399  5392 80 21:15 pts/200:00:06
/usr/local/java/bin/java
 -cp

   
  
 
 ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx
 -XX:MaxPermSize=128m -Dfoo.bar.baz=23 -Djava.library.path=
  -Xms512m
 -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v
 --driver-java-options -Dfoo.bar.baz=23 --class
org.apache.spark.repl.Main




 On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell 
   pwend...@gmail.com
 wrote:

 Cody - in your example you are using the '=' character, but
 in
  our
 documentation and tests we use a whitespace to separate the
 key
   and
 value in the defaults file.

 docs:
 http://spark.apache.org/docs/latest/configuration.html

 spark.driver.extraJavaOptions -Dfoo.bar.baz=23

 I'm not sure if the java properties file parser will try to
   interpret
 the equals sign. If so you might need to do this.

 spark.driver.extraJavaOptions -Dfoo.bar.baz=23

 Do those work for you?

 On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin 
   van...@cloudera.com

 wrote:
  Hi Cody,
 
  Could you file a bug for this if there isn't one already?
 
  For system properties SparkSubmit should be able to read
  those
  settings and do the right thing, but that obviously won't
  work
   for
  other JVM options... the current code should work fine in
   cluster
mode
  though, since the driver is a different process. :-)
 
 
  On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger 
c...@koeninger.org
 wrote:
  We were previously using SPARK_JAVA_OPTS to set java
 system
properties
 via
  -D.
 
  This was used for properties that varied on a
 per-deployment-environment
  basis, but needed to be available in the spark shell and
   workers.
 
  On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been
deprecated,
 and
  replaced by spark-defaults.conf and command line
 arguments
  to
 spark-submit
  or spark-shell.
 
  However, setting spark.driver.extraJavaOptions and
  spark.executor.extraJavaOptions in spark-defaults.conf is
  not a
 replacement
  for SPARK_JAVA_OPTS:
 
 
  $ cat conf/spark-defaults.conf
  spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
 
  $ ./bin/spark-shell
 
  scala System.getProperty(foo.bar.baz)
  res0: String = null
 
 
  $ ./bin/spark-shell --driver-java-options
 -Dfoo.bar.baz=23
 
  scala System.getProperty(foo.bar.baz)
  res0: String = 23
 
 
  Looking through the shell scripts for spark-submit and
spark-class, I
 can
  see why this is; parsing spark-defaults.conf from bash
  could be
 brittle.
 
  But from an ergonomic point of view, it's a step back to
 go
   from a
  set-it-and-forget-it configuration

Re: replacement for SPARK_JAVA_OPTS

2014-07-31 Thread Patrick Wendell
 the right thing, but that obviously won't work for
  other JVM options... the current code should work fine in cluster mode
  though, since the driver is a different process. :-)
 
 
  On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger c...@koeninger.org
 wrote:
  We were previously using SPARK_JAVA_OPTS to set java system properties
 via
  -D.
 
  This was used for properties that varied on a
 per-deployment-environment
  basis, but needed to be available in the spark shell and workers.
 
  On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated,
 and
  replaced by spark-defaults.conf and command line arguments to
 spark-submit
  or spark-shell.
 
  However, setting spark.driver.extraJavaOptions and
  spark.executor.extraJavaOptions in spark-defaults.conf is not a
 replacement
  for SPARK_JAVA_OPTS:
 
 
  $ cat conf/spark-defaults.conf
  spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
 
  $ ./bin/spark-shell
 
  scala System.getProperty(foo.bar.baz)
  res0: String = null
 
 
  $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23
 
  scala System.getProperty(foo.bar.baz)
  res0: String = 23
 
 
  Looking through the shell scripts for spark-submit and spark-class, I
 can
  see why this is; parsing spark-defaults.conf from bash could be
 brittle.
 
  But from an ergonomic point of view, it's a step back to go from a
  set-it-and-forget-it configuration in spark-env.sh, to requiring
 command
  line arguments.
 
  I can solve this with an ad-hoc script to wrap spark-shell with the
  appropriate arguments, but I wanted to bring the issue up to see if
 anyone
  else had run into it,
  or had any direction for a general solution (beyond parsing java
 properties
  files from bash).
 
 
 
  --
  Marcelo





Re: replacement for SPARK_JAVA_OPTS

2014-07-31 Thread Patrick Wendell
. If so you might need to do this.

 spark.driver.extraJavaOptions -Dfoo.bar.baz=23

 Do those work for you?

 On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin van...@cloudera.com
 wrote:
  Hi Cody,
 
  Could you file a bug for this if there isn't one already?
 
  For system properties SparkSubmit should be able to read those
  settings and do the right thing, but that obviously won't work for
  other JVM options... the current code should work fine in cluster mode
  though, since the driver is a different process. :-)
 
 
  On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger c...@koeninger.org
 wrote:
  We were previously using SPARK_JAVA_OPTS to set java system properties
 via
  -D.
 
  This was used for properties that varied on a
 per-deployment-environment
  basis, but needed to be available in the spark shell and workers.
 
  On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated,
 and
  replaced by spark-defaults.conf and command line arguments to
 spark-submit
  or spark-shell.
 
  However, setting spark.driver.extraJavaOptions and
  spark.executor.extraJavaOptions in spark-defaults.conf is not a
 replacement
  for SPARK_JAVA_OPTS:
 
 
  $ cat conf/spark-defaults.conf
  spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
 
  $ ./bin/spark-shell
 
  scala System.getProperty(foo.bar.baz)
  res0: String = null
 
 
  $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23
 
  scala System.getProperty(foo.bar.baz)
  res0: String = 23
 
 
  Looking through the shell scripts for spark-submit and spark-class, I
 can
  see why this is; parsing spark-defaults.conf from bash could be
 brittle.
 
  But from an ergonomic point of view, it's a step back to go from a
  set-it-and-forget-it configuration in spark-env.sh, to requiring
 command
  line arguments.
 
  I can solve this with an ad-hoc script to wrap spark-shell with the
  appropriate arguments, but I wanted to bring the issue up to see if
 anyone
  else had run into it,
  or had any direction for a general solution (beyond parsing java
 properties
  files from bash).
 
 
 
  --
  Marcelo





Re: replacement for SPARK_JAVA_OPTS

2014-07-31 Thread Cody Koeninger
 -Djava.library.path= -Xms512m
  -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v
  --driver-java-options -Dfoo.bar.baz=23 --class
 org.apache.spark.repl.Main
 
 
 
 
  On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Cody - in your example you are using the '=' character, but in our
  documentation and tests we use a whitespace to separate the key and
  value in the defaults file.
 
  docs: http://spark.apache.org/docs/latest/configuration.html
 
  spark.driver.extraJavaOptions -Dfoo.bar.baz=23
 
  I'm not sure if the java properties file parser will try to interpret
  the equals sign. If so you might need to do this.
 
  spark.driver.extraJavaOptions -Dfoo.bar.baz=23
 
  Do those work for you?
 
  On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin van...@cloudera.com
  wrote:
   Hi Cody,
  
   Could you file a bug for this if there isn't one already?
  
   For system properties SparkSubmit should be able to read those
   settings and do the right thing, but that obviously won't work for
   other JVM options... the current code should work fine in cluster
 mode
   though, since the driver is a different process. :-)
  
  
   On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger c...@koeninger.org
 
  wrote:
   We were previously using SPARK_JAVA_OPTS to set java system
 properties
  via
   -D.
  
   This was used for properties that varied on a
  per-deployment-environment
   basis, but needed to be available in the spark shell and workers.
  
   On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been
 deprecated,
  and
   replaced by spark-defaults.conf and command line arguments to
  spark-submit
   or spark-shell.
  
   However, setting spark.driver.extraJavaOptions and
   spark.executor.extraJavaOptions in spark-defaults.conf is not a
  replacement
   for SPARK_JAVA_OPTS:
  
  
   $ cat conf/spark-defaults.conf
   spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
  
   $ ./bin/spark-shell
  
   scala System.getProperty(foo.bar.baz)
   res0: String = null
  
  
   $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23
  
   scala System.getProperty(foo.bar.baz)
   res0: String = 23
  
  
   Looking through the shell scripts for spark-submit and
 spark-class, I
  can
   see why this is; parsing spark-defaults.conf from bash could be
  brittle.
  
   But from an ergonomic point of view, it's a step back to go from a
   set-it-and-forget-it configuration in spark-env.sh, to requiring
  command
   line arguments.
  
   I can solve this with an ad-hoc script to wrap spark-shell with the
   appropriate arguments, but I wanted to bring the issue up to see if
  anyone
   else had run into it,
   or had any direction for a general solution (beyond parsing java
  properties
   files from bash).
  
  
  
   --
   Marcelo
 
 
 



replacement for SPARK_JAVA_OPTS

2014-07-30 Thread Cody Koeninger
We were previously using SPARK_JAVA_OPTS to set java system properties via
-D.

This was used for properties that varied on a per-deployment-environment
basis, but needed to be available in the spark shell and workers.

On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated, and
replaced by spark-defaults.conf and command line arguments to spark-submit
or spark-shell.

However, setting spark.driver.extraJavaOptions and
spark.executor.extraJavaOptions in spark-defaults.conf is not a replacement
for SPARK_JAVA_OPTS:


$ cat conf/spark-defaults.conf
spark.driver.extraJavaOptions=-Dfoo.bar.baz=23

$ ./bin/spark-shell

scala System.getProperty(foo.bar.baz)
res0: String = null


$ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23

scala System.getProperty(foo.bar.baz)
res0: String = 23


Looking through the shell scripts for spark-submit and spark-class, I can
see why this is; parsing spark-defaults.conf from bash could be brittle.

But from an ergonomic point of view, it's a step back to go from a
set-it-and-forget-it configuration in spark-env.sh, to requiring command
line arguments.

I can solve this with an ad-hoc script to wrap spark-shell with the
appropriate arguments, but I wanted to bring the issue up to see if anyone
else had run into it,
or had any direction for a general solution (beyond parsing java properties
files from bash).


Re: replacement for SPARK_JAVA_OPTS

2014-07-30 Thread Marcelo Vanzin
Hi Cody,

Could you file a bug for this if there isn't one already?

For system properties SparkSubmit should be able to read those
settings and do the right thing, but that obviously won't work for
other JVM options... the current code should work fine in cluster mode
though, since the driver is a different process. :-)


On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger c...@koeninger.org wrote:
 We were previously using SPARK_JAVA_OPTS to set java system properties via
 -D.

 This was used for properties that varied on a per-deployment-environment
 basis, but needed to be available in the spark shell and workers.

 On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated, and
 replaced by spark-defaults.conf and command line arguments to spark-submit
 or spark-shell.

 However, setting spark.driver.extraJavaOptions and
 spark.executor.extraJavaOptions in spark-defaults.conf is not a replacement
 for SPARK_JAVA_OPTS:


 $ cat conf/spark-defaults.conf
 spark.driver.extraJavaOptions=-Dfoo.bar.baz=23

 $ ./bin/spark-shell

 scala System.getProperty(foo.bar.baz)
 res0: String = null


 $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23

 scala System.getProperty(foo.bar.baz)
 res0: String = 23


 Looking through the shell scripts for spark-submit and spark-class, I can
 see why this is; parsing spark-defaults.conf from bash could be brittle.

 But from an ergonomic point of view, it's a step back to go from a
 set-it-and-forget-it configuration in spark-env.sh, to requiring command
 line arguments.

 I can solve this with an ad-hoc script to wrap spark-shell with the
 appropriate arguments, but I wanted to bring the issue up to see if anyone
 else had run into it,
 or had any direction for a general solution (beyond parsing java properties
 files from bash).



-- 
Marcelo


Re: replacement for SPARK_JAVA_OPTS

2014-07-30 Thread Patrick Wendell
Cody - in your example you are using the '=' character, but in our
documentation and tests we use a whitespace to separate the key and
value in the defaults file.

docs: http://spark.apache.org/docs/latest/configuration.html

spark.driver.extraJavaOptions -Dfoo.bar.baz=23

I'm not sure if the java properties file parser will try to interpret
the equals sign. If so you might need to do this.

spark.driver.extraJavaOptions -Dfoo.bar.baz=23

Do those work for you?

On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin van...@cloudera.com wrote:
 Hi Cody,

 Could you file a bug for this if there isn't one already?

 For system properties SparkSubmit should be able to read those
 settings and do the right thing, but that obviously won't work for
 other JVM options... the current code should work fine in cluster mode
 though, since the driver is a different process. :-)


 On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger c...@koeninger.org wrote:
 We were previously using SPARK_JAVA_OPTS to set java system properties via
 -D.

 This was used for properties that varied on a per-deployment-environment
 basis, but needed to be available in the spark shell and workers.

 On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated, and
 replaced by spark-defaults.conf and command line arguments to spark-submit
 or spark-shell.

 However, setting spark.driver.extraJavaOptions and
 spark.executor.extraJavaOptions in spark-defaults.conf is not a replacement
 for SPARK_JAVA_OPTS:


 $ cat conf/spark-defaults.conf
 spark.driver.extraJavaOptions=-Dfoo.bar.baz=23

 $ ./bin/spark-shell

 scala System.getProperty(foo.bar.baz)
 res0: String = null


 $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23

 scala System.getProperty(foo.bar.baz)
 res0: String = 23


 Looking through the shell scripts for spark-submit and spark-class, I can
 see why this is; parsing spark-defaults.conf from bash could be brittle.

 But from an ergonomic point of view, it's a step back to go from a
 set-it-and-forget-it configuration in spark-env.sh, to requiring command
 line arguments.

 I can solve this with an ad-hoc script to wrap spark-shell with the
 appropriate arguments, but I wanted to bring the issue up to see if anyone
 else had run into it,
 or had any direction for a general solution (beyond parsing java properties
 files from bash).



 --
 Marcelo


Re: replacement for SPARK_JAVA_OPTS

2014-07-30 Thread Cody Koeninger
Either whitespace or equals sign are valid properties file formats.
Here's an example:

$ cat conf/spark-defaults.conf
spark.driver.extraJavaOptions -Dfoo.bar.baz=23

$ ./bin/spark-shell -v
Using properties file: /opt/spark/conf/spark-defaults.conf
Adding default property: spark.driver.extraJavaOptions=-Dfoo.bar.baz=23

scala  System.getProperty(foo.bar.baz)
res0: String = null


If you add double quotes, the resulting string value will have double
quotes.


$ cat conf/spark-defaults.conf
spark.driver.extraJavaOptions -Dfoo.bar.baz=23

$ ./bin/spark-shell -v
Using properties file: /opt/spark/conf/spark-defaults.conf
Adding default property: spark.driver.extraJavaOptions=-Dfoo.bar.baz=23

scala  System.getProperty(foo.bar.baz)
res0: String = null


Neither one of those affects the issue; the underlying problem in my case
seems to be that bin/spark-class uses the SPARK_SUBMIT_OPTS and
SPARK_JAVA_OPTS environment variables, but nothing parses
spark-defaults.conf before the java process is started.

Here's an example of the process running when only spark-defaults.conf is
being used:

$ ps -ef | grep spark

514   5182  2058  0 21:05 pts/200:00:00 bash ./bin/spark-shell -v

514   5189  5182  4 21:05 pts/200:00:22 /usr/local/java/bin/java
-cp
::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx
-XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
org.apache.spark.deploy.SparkSubmit spark-shell -v --class
org.apache.spark.repl.Main


Here's an example of it when the command line --driver-java-options is used
(and thus things work):


$ ps -ef | grep spark
514   5392  2058  0 21:15 pts/200:00:00 bash ./bin/spark-shell -v
--driver-java-options -Dfoo.bar.baz=23

514   5399  5392 80 21:15 pts/200:00:06 /usr/local/java/bin/java
-cp
::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx
-XX:MaxPermSize=128m -Dfoo.bar.baz=23 -Djava.library.path= -Xms512m
-Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v
--driver-java-options -Dfoo.bar.baz=23 --class org.apache.spark.repl.Main




On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell pwend...@gmail.com wrote:

 Cody - in your example you are using the '=' character, but in our
 documentation and tests we use a whitespace to separate the key and
 value in the defaults file.

 docs: http://spark.apache.org/docs/latest/configuration.html

 spark.driver.extraJavaOptions -Dfoo.bar.baz=23

 I'm not sure if the java properties file parser will try to interpret
 the equals sign. If so you might need to do this.

 spark.driver.extraJavaOptions -Dfoo.bar.baz=23

 Do those work for you?

 On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin van...@cloudera.com
 wrote:
  Hi Cody,
 
  Could you file a bug for this if there isn't one already?
 
  For system properties SparkSubmit should be able to read those
  settings and do the right thing, but that obviously won't work for
  other JVM options... the current code should work fine in cluster mode
  though, since the driver is a different process. :-)
 
 
  On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger c...@koeninger.org
 wrote:
  We were previously using SPARK_JAVA_OPTS to set java system properties
 via
  -D.
 
  This was used for properties that varied on a per-deployment-environment
  basis, but needed to be available in the spark shell and workers.
 
  On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated,
 and
  replaced by spark-defaults.conf and command line arguments to
 spark-submit
  or spark-shell.
 
  However, setting spark.driver.extraJavaOptions and
  spark.executor.extraJavaOptions in spark-defaults.conf is not a
 replacement
  for SPARK_JAVA_OPTS:
 
 
  $ cat conf/spark-defaults.conf
  spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
 
  $ ./bin/spark-shell
 
  scala System.getProperty(foo.bar.baz)
  res0: String = null
 
 
  $ ./bin/spark-shell --driver-java-options -Dfoo.bar.baz=23
 
  scala System.getProperty(foo.bar.baz)
  res0: String = 23
 
 
  Looking through the shell scripts for spark-submit and spark-class, I
 can
  see why this is; parsing spark-defaults.conf from bash could be brittle.
 
  But from an ergonomic point of view, it's a step back to go from a
  set-it-and-forget-it configuration in spark-env.sh, to requiring command
  line arguments.
 
  I can solve this with an ad-hoc script to wrap spark-shell with the
  appropriate arguments, but I wanted to bring the issue up to see if
 anyone
  else had run into it,
  or had any direction for a general solution (beyond parsing java
 properties
  files from bash).
 
 
 
  --
  Marcelo



Re: replacement for SPARK_JAVA_OPTS

2014-07-30 Thread Cody Koeninger
In addition, spark.executor.extraJavaOptions does not seem to behave as I
would expect; java arguments don't seem to be propagated to executors.


$ cat conf/spark-defaults.conf

spark.master
mesos://zk://etl-01.mxstg:2181,etl-02.mxstg:2181,etl-03.mxstg:2181/masters
spark.executor.extraJavaOptions -Dfoo.bar.baz=23
spark.driver.extraJavaOptions -Dfoo.bar.baz=23


$ ./bin/spark-shell

scala sc.getConf.get(spark.executor.extraJavaOptions)
res0: String = -Dfoo.bar.baz=23

scala sc.parallelize(1 to 100).map{ i = (
 |  java.net.InetAddress.getLocalHost.getHostName,
 |  System.getProperty(foo.bar.baz)
 | )}.collect

res1: Array[(String, String)] = Array((dn-01.mxstg,null),
(dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
(dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
(dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
(dn-01.mxstg,null), (dn-01.mxstg,null), (dn-02.mxstg,null),
(dn-02.mxstg,null), ...



Note that this is a mesos deployment, although I wouldn't expect that to
affect the availability of spark.driver.extraJavaOptions in a local spark
shell.


On Wed, Jul 30, 2014 at 4:18 PM, Cody Koeninger c...@koeninger.org wrote:

 Either whitespace or equals sign are valid properties file formats.
 Here's an example:

 $ cat conf/spark-defaults.conf
 spark.driver.extraJavaOptions -Dfoo.bar.baz=23

 $ ./bin/spark-shell -v
 Using properties file: /opt/spark/conf/spark-defaults.conf
 Adding default property: spark.driver.extraJavaOptions=-Dfoo.bar.baz=23


 scala  System.getProperty(foo.bar.baz)
 res0: String = null


 If you add double quotes, the resulting string value will have double
 quotes.


 $ cat conf/spark-defaults.conf
 spark.driver.extraJavaOptions -Dfoo.bar.baz=23

 $ ./bin/spark-shell -v
 Using properties file: /opt/spark/conf/spark-defaults.conf
 Adding default property: spark.driver.extraJavaOptions=-Dfoo.bar.baz=23

 scala  System.getProperty(foo.bar.baz)
 res0: String = null


 Neither one of those affects the issue; the underlying problem in my case
 seems to be that bin/spark-class uses the SPARK_SUBMIT_OPTS and
 SPARK_JAVA_OPTS environment variables, but nothing parses
 spark-defaults.conf before the java process is started.

 Here's an example of the process running when only spark-defaults.conf is
 being used:

 $ ps -ef | grep spark

 514   5182  2058  0 21:05 pts/200:00:00 bash ./bin/spark-shell -v

 514   5189  5182  4 21:05 pts/200:00:22 /usr/local/java/bin/java
 -cp
 ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx
 -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
 org.apache.spark.deploy.SparkSubmit spark-shell -v --class
 org.apache.spark.repl.Main


 Here's an example of it when the command line --driver-java-options is
 used (and thus things work):


 $ ps -ef | grep spark
 514   5392  2058  0 21:15 pts/200:00:00 bash ./bin/spark-shell -v
 --driver-java-options -Dfoo.bar.baz=23

 514   5399  5392 80 21:15 pts/200:00:06 /usr/local/java/bin/java
 -cp
 ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx
 -XX:MaxPermSize=128m -Dfoo.bar.baz=23 -Djava.library.path= -Xms512m
 -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v
 --driver-java-options -Dfoo.bar.baz=23 --class org.apache.spark.repl.Main




 On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 Cody - in your example you are using the '=' character, but in our
 documentation and tests we use a whitespace to separate the key and
 value in the defaults file.

 docs: http://spark.apache.org/docs/latest/configuration.html

 spark.driver.extraJavaOptions -Dfoo.bar.baz=23

 I'm not sure if the java properties file parser will try to interpret
 the equals sign. If so you might need to do this.

 spark.driver.extraJavaOptions -Dfoo.bar.baz=23

 Do those work for you?

 On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin van...@cloudera.com
 wrote:
  Hi Cody,
 
  Could you file a bug for this if there isn't one already?
 
  For system properties SparkSubmit should be able to read those
  settings and do the right thing, but that obviously won't work for
  other JVM options... the current code should work fine in cluster mode
  though, since the driver is a different process. :-)
 
 
  On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger c...@koeninger.org
 wrote:
  We were previously using SPARK_JAVA_OPTS to set java system properties
 via
  -D.
 
  This was used for properties that varied on a
 per-deployment-environment
  basis, but needed to be available in the spark shell and workers.
 
  On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated,
 and
  replaced by spark-defaults.conf and command line arguments to
 spark-submit
  or spark-shell.
 
  However, setting spark.driver.extraJavaOptions and
  spark.executor.extraJavaOptions in spark-defaults.conf is not a
 replacement
  for SPARK_JAVA_OPTS