Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
I added a fix for this recently and it didn't require adding -J
notation - are you trying it with this patch?

https://issues.apache.org/jira/browse/SPARK-1654

 ./bin/spark-shell --driver-java-options -Dfoo=a -Dbar=b
scala sys.props.get(foo)
res0: Option[String] = Some(a)
scala sys.props.get(bar)
res1: Option[String] = Some(b)

- Patrick

On Wed, Apr 30, 2014 at 11:29 AM, Marcelo Vanzin van...@cloudera.com wrote:
 Hello all,

 Maybe my brain is not evolved enough to be able to trace through what
 happens with command-line arguments as they're parsed through all the
 shell scripts... but I really can't figure out how to pass more than a
 single JVM option on the command line.

 Unless someone has an obvious workaround that I'm missing, I'd like to
 propose something that is actually pretty standard in JVM tools: using
 -J. From javac:

   -Jflag   Pass flag directly to the runtime system

 So javac -J-Xmx1g would pass -Xmx1g to the underlying JVM. You can
 use several of those to pass multiple options (unlike
 --driver-java-options), so it helps that it's a short syntax.

 Unless someone has some issue with that I'll work on a patch for it...
 (well, I'm going to do it locally for me anyway because I really can't
 figure out how to do what I want to otherwise.)


 --
 Marcelo


Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Marcelo Vanzin
Just pulled again just in case. Verified your fix is there.

$ ./bin/spark-submit --master yarn --deploy-mode client
--driver-java-options -Dfoo -Dbar blah blah blah
error: Unrecognized option '-Dbar'.
run with --help for more information or --verbose for debugging output


On Wed, Apr 30, 2014 at 12:49 PM, Patrick Wendell pwend...@gmail.com wrote:
 I added a fix for this recently and it didn't require adding -J
 notation - are you trying it with this patch?

 https://issues.apache.org/jira/browse/SPARK-1654

  ./bin/spark-shell --driver-java-options -Dfoo=a -Dbar=b
 scala sys.props.get(foo)
 res0: Option[String] = Some(a)
 scala sys.props.get(bar)
 res1: Option[String] = Some(b)

 - Patrick

 On Wed, Apr 30, 2014 at 11:29 AM, Marcelo Vanzin van...@cloudera.com wrote:
 Hello all,

 Maybe my brain is not evolved enough to be able to trace through what
 happens with command-line arguments as they're parsed through all the
 shell scripts... but I really can't figure out how to pass more than a
 single JVM option on the command line.

 Unless someone has an obvious workaround that I'm missing, I'd like to
 propose something that is actually pretty standard in JVM tools: using
 -J. From javac:

   -Jflag   Pass flag directly to the runtime system

 So javac -J-Xmx1g would pass -Xmx1g to the underlying JVM. You can
 use several of those to pass multiple options (unlike
 --driver-java-options), so it helps that it's a short syntax.

 Unless someone has some issue with that I'll work on a patch for it...
 (well, I'm going to do it locally for me anyway because I really can't
 figure out how to do what I want to otherwise.)


 --
 Marcelo



-- 
Marcelo


Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
Yeah I think the problem is that the spark-submit script doesn't pass
the argument array to spark-class in the right way, so any quoted
strings get flattened.

We do:
ORIG_ARGS=$@
$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS

This works:
// remove all the code relating to `shift`ing the arguments
$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $@

Not sure, but I think the issue is that when you make a copy of $@ in
bash the type actually changes from an array to something else.

My patch fixes this for spark-shell but I didn't realize that
spark-submit does the same thing.
https://github.com/apache/spark/pull/576/files#diff-bc287993dfd11fd18794041e169ffd72L23

I think we'll need to figure out how to do this correctly in the bash
script so that quoted strings get passed in the right way.

On Wed, Apr 30, 2014 at 1:06 PM, Marcelo Vanzin van...@cloudera.com wrote:
 Just pulled again just in case. Verified your fix is there.

 $ ./bin/spark-submit --master yarn --deploy-mode client
 --driver-java-options -Dfoo -Dbar blah blah blah
 error: Unrecognized option '-Dbar'.
 run with --help for more information or --verbose for debugging output


 On Wed, Apr 30, 2014 at 12:49 PM, Patrick Wendell pwend...@gmail.com wrote:
 I added a fix for this recently and it didn't require adding -J
 notation - are you trying it with this patch?

 https://issues.apache.org/jira/browse/SPARK-1654

  ./bin/spark-shell --driver-java-options -Dfoo=a -Dbar=b
 scala sys.props.get(foo)
 res0: Option[String] = Some(a)
 scala sys.props.get(bar)
 res1: Option[String] = Some(b)

 - Patrick

 On Wed, Apr 30, 2014 at 11:29 AM, Marcelo Vanzin van...@cloudera.com wrote:
 Hello all,

 Maybe my brain is not evolved enough to be able to trace through what
 happens with command-line arguments as they're parsed through all the
 shell scripts... but I really can't figure out how to pass more than a
 single JVM option on the command line.

 Unless someone has an obvious workaround that I'm missing, I'd like to
 propose something that is actually pretty standard in JVM tools: using
 -J. From javac:

   -Jflag   Pass flag directly to the runtime system

 So javac -J-Xmx1g would pass -Xmx1g to the underlying JVM. You can
 use several of those to pass multiple options (unlike
 --driver-java-options), so it helps that it's a short syntax.

 Unless someone has some issue with that I'll work on a patch for it...
 (well, I'm going to do it locally for me anyway because I really can't
 figure out how to do what I want to otherwise.)


 --
 Marcelo



 --
 Marcelo


Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Marcelo Vanzin
On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell pwend...@gmail.com wrote:
 Yeah I think the problem is that the spark-submit script doesn't pass
 the argument array to spark-class in the right way, so any quoted
 strings get flattened.

 I think we'll need to figure out how to do this correctly in the bash
 script so that quoted strings get passed in the right way.

I tried a few different approaches but finally ended up giving up; my
bash-fu is apparently not strong enough. If you can make it work
great, but I have -J working locally in case you give up like me.
:-)

-- 
Marcelo


Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
So I reproduced the problem here:

== test.sh ==
#!/bin/bash
for x in $@; do
  echo arg: $x
done
ARGS_COPY=$@
for x in $ARGS_COPY; do
  echo arg_copy: $x
done
==

./test.sh a b c d e f
arg: a
arg: b
arg: c d e
arg: f
arg_copy: a b c d e f

I'll dig around a bit more and see if we can fix it. Pretty sure we
aren't passing these argument arrays around correctly in bash.

On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin van...@cloudera.com wrote:
 On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell pwend...@gmail.com wrote:
 Yeah I think the problem is that the spark-submit script doesn't pass
 the argument array to spark-class in the right way, so any quoted
 strings get flattened.

 I think we'll need to figure out how to do this correctly in the bash
 script so that quoted strings get passed in the right way.

 I tried a few different approaches but finally ended up giving up; my
 bash-fu is apparently not strong enough. If you can make it work
 great, but I have -J working locally in case you give up like me.
 :-)

 --
 Marcelo


Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Dean Wampler
Try this:

#!/bin/bash
for x in $@; do
  echo arg: $x
done
ARGS_COPY=($@) # Make ARGS_COPY an array with the array elements in $@

for x in ${ARGS_COPY[@]}; do# preserve array arguments.
  echo arg_copy: $x
done



On Wed, Apr 30, 2014 at 3:51 PM, Patrick Wendell pwend...@gmail.com wrote:

 So I reproduced the problem here:

 == test.sh ==
 #!/bin/bash
 for x in $@; do
   echo arg: $x
 done
 ARGS_COPY=$@
 for x in $ARGS_COPY; do
   echo arg_copy: $x
 done
 ==

 ./test.sh a b c d e f
 arg: a
 arg: b
 arg: c d e
 arg: f
 arg_copy: a b c d e f

 I'll dig around a bit more and see if we can fix it. Pretty sure we
 aren't passing these argument arrays around correctly in bash.

 On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin van...@cloudera.com
 wrote:
  On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell pwend...@gmail.com
 wrote:
  Yeah I think the problem is that the spark-submit script doesn't pass
  the argument array to spark-class in the right way, so any quoted
  strings get flattened.
 
  I think we'll need to figure out how to do this correctly in the bash
  script so that quoted strings get passed in the right way.
 
  I tried a few different approaches but finally ended up giving up; my
  bash-fu is apparently not strong enough. If you can make it work
  great, but I have -J working locally in case you give up like me.
  :-)
 
  --
  Marcelo




-- 
Dean Wampler, Ph.D.
Typesafe
@deanwampler
http://typesafe.com
http://polyglotprogramming.com


Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
Marcelo - Mind trying the following diff locally? If it works I can
send a patch:

patrick@patrick-t430s:~/Documents/spark$ git diff bin/spark-submit
diff --git a/bin/spark-submit b/bin/spark-submit
index dd0d95d..49bc262 100755
--- a/bin/spark-submit
+++ b/bin/spark-submit
@@ -18,7 +18,7 @@
 #

 export SPARK_HOME=$(cd `dirname $0`/..; pwd)
-ORIG_ARGS=$@
+ORIG_ARGS=($@)

 while (($#)); do
   if [ $1 = --deploy-mode ]; then
@@ -39,5 +39,5 @@ if [ ! -z $DRIVER_MEMORY ]  [ ! -z $DEPLOY_MODE ]
 [ $DEPLOY_MODE = client
   export SPARK_MEM=$DRIVER_MEMORY
 fi

-$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS
+$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit
${ORIG_ARGS[@]}

On Wed, Apr 30, 2014 at 1:51 PM, Patrick Wendell pwend...@gmail.com wrote:
 So I reproduced the problem here:

 == test.sh ==
 #!/bin/bash
 for x in $@; do
   echo arg: $x
 done
 ARGS_COPY=$@
 for x in $ARGS_COPY; do
   echo arg_copy: $x
 done
 ==

 ./test.sh a b c d e f
 arg: a
 arg: b
 arg: c d e
 arg: f
 arg_copy: a b c d e f

 I'll dig around a bit more and see if we can fix it. Pretty sure we
 aren't passing these argument arrays around correctly in bash.

 On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin van...@cloudera.com wrote:
 On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell pwend...@gmail.com wrote:
 Yeah I think the problem is that the spark-submit script doesn't pass
 the argument array to spark-class in the right way, so any quoted
 strings get flattened.

 I think we'll need to figure out how to do this correctly in the bash
 script so that quoted strings get passed in the right way.

 I tried a few different approaches but finally ended up giving up; my
 bash-fu is apparently not strong enough. If you can make it work
 great, but I have -J working locally in case you give up like me.
 :-)

 --
 Marcelo


Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Marcelo Vanzin
Cool, that seems to work. Thanks!

On Wed, Apr 30, 2014 at 2:09 PM, Patrick Wendell pwend...@gmail.com wrote:
 Marcelo - Mind trying the following diff locally? If it works I can
 send a patch:

 patrick@patrick-t430s:~/Documents/spark$ git diff bin/spark-submit
 diff --git a/bin/spark-submit b/bin/spark-submit
 index dd0d95d..49bc262 100755
 --- a/bin/spark-submit
 +++ b/bin/spark-submit
 @@ -18,7 +18,7 @@
  #

  export SPARK_HOME=$(cd `dirname $0`/..; pwd)
 -ORIG_ARGS=$@
 +ORIG_ARGS=($@)

  while (($#)); do
if [ $1 = --deploy-mode ]; then
 @@ -39,5 +39,5 @@ if [ ! -z $DRIVER_MEMORY ]  [ ! -z $DEPLOY_MODE ]
  [ $DEPLOY_MODE = client
export SPARK_MEM=$DRIVER_MEMORY
  fi

 -$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS
 +$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit
 ${ORIG_ARGS[@]}

 On Wed, Apr 30, 2014 at 1:51 PM, Patrick Wendell pwend...@gmail.com wrote:
 So I reproduced the problem here:

 == test.sh ==
 #!/bin/bash
 for x in $@; do
   echo arg: $x
 done
 ARGS_COPY=$@
 for x in $ARGS_COPY; do
   echo arg_copy: $x
 done
 ==

 ./test.sh a b c d e f
 arg: a
 arg: b
 arg: c d e
 arg: f
 arg_copy: a b c d e f

 I'll dig around a bit more and see if we can fix it. Pretty sure we
 aren't passing these argument arrays around correctly in bash.

 On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin van...@cloudera.com wrote:
 On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell pwend...@gmail.com wrote:
 Yeah I think the problem is that the spark-submit script doesn't pass
 the argument array to spark-class in the right way, so any quoted
 strings get flattened.

 I think we'll need to figure out how to do this correctly in the bash
 script so that quoted strings get passed in the right way.

 I tried a few different approaches but finally ended up giving up; my
 bash-fu is apparently not strong enough. If you can make it work
 great, but I have -J working locally in case you give up like me.
 :-)

 --
 Marcelo



-- 
Marcelo


Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
Dean - our e-mails crossed, but thanks for the tip. Was independently
arriving at your solution :)

Okay I'll submit something.

- Patrick

On Wed, Apr 30, 2014 at 2:14 PM, Marcelo Vanzin van...@cloudera.com wrote:
 Cool, that seems to work. Thanks!

 On Wed, Apr 30, 2014 at 2:09 PM, Patrick Wendell pwend...@gmail.com wrote:
 Marcelo - Mind trying the following diff locally? If it works I can
 send a patch:

 patrick@patrick-t430s:~/Documents/spark$ git diff bin/spark-submit
 diff --git a/bin/spark-submit b/bin/spark-submit
 index dd0d95d..49bc262 100755
 --- a/bin/spark-submit
 +++ b/bin/spark-submit
 @@ -18,7 +18,7 @@
  #

  export SPARK_HOME=$(cd `dirname $0`/..; pwd)
 -ORIG_ARGS=$@
 +ORIG_ARGS=($@)

  while (($#)); do
if [ $1 = --deploy-mode ]; then
 @@ -39,5 +39,5 @@ if [ ! -z $DRIVER_MEMORY ]  [ ! -z $DEPLOY_MODE ]
  [ $DEPLOY_MODE = client
export SPARK_MEM=$DRIVER_MEMORY
  fi

 -$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS
 +$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit
 ${ORIG_ARGS[@]}

 On Wed, Apr 30, 2014 at 1:51 PM, Patrick Wendell pwend...@gmail.com wrote:
 So I reproduced the problem here:

 == test.sh ==
 #!/bin/bash
 for x in $@; do
   echo arg: $x
 done
 ARGS_COPY=$@
 for x in $ARGS_COPY; do
   echo arg_copy: $x
 done
 ==

 ./test.sh a b c d e f
 arg: a
 arg: b
 arg: c d e
 arg: f
 arg_copy: a b c d e f

 I'll dig around a bit more and see if we can fix it. Pretty sure we
 aren't passing these argument arrays around correctly in bash.

 On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin van...@cloudera.com wrote:
 On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell pwend...@gmail.com 
 wrote:
 Yeah I think the problem is that the spark-submit script doesn't pass
 the argument array to spark-class in the right way, so any quoted
 strings get flattened.

 I think we'll need to figure out how to do this correctly in the bash
 script so that quoted strings get passed in the right way.

 I tried a few different approaches but finally ended up giving up; my
 bash-fu is apparently not strong enough. If you can make it work
 great, but I have -J working locally in case you give up like me.
 :-)

 --
 Marcelo



 --
 Marcelo


Re: SparkSubmit and --driver-java-options

2014-04-30 Thread Patrick Wendell
Patch here:
https://github.com/apache/spark/pull/609

On Wed, Apr 30, 2014 at 2:26 PM, Patrick Wendell pwend...@gmail.com wrote:
 Dean - our e-mails crossed, but thanks for the tip. Was independently
 arriving at your solution :)

 Okay I'll submit something.

 - Patrick

 On Wed, Apr 30, 2014 at 2:14 PM, Marcelo Vanzin van...@cloudera.com wrote:
 Cool, that seems to work. Thanks!

 On Wed, Apr 30, 2014 at 2:09 PM, Patrick Wendell pwend...@gmail.com wrote:
 Marcelo - Mind trying the following diff locally? If it works I can
 send a patch:

 patrick@patrick-t430s:~/Documents/spark$ git diff bin/spark-submit
 diff --git a/bin/spark-submit b/bin/spark-submit
 index dd0d95d..49bc262 100755
 --- a/bin/spark-submit
 +++ b/bin/spark-submit
 @@ -18,7 +18,7 @@
  #

  export SPARK_HOME=$(cd `dirname $0`/..; pwd)
 -ORIG_ARGS=$@
 +ORIG_ARGS=($@)

  while (($#)); do
if [ $1 = --deploy-mode ]; then
 @@ -39,5 +39,5 @@ if [ ! -z $DRIVER_MEMORY ]  [ ! -z $DEPLOY_MODE ]
  [ $DEPLOY_MODE = client
export SPARK_MEM=$DRIVER_MEMORY
  fi

 -$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS
 +$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit
 ${ORIG_ARGS[@]}

 On Wed, Apr 30, 2014 at 1:51 PM, Patrick Wendell pwend...@gmail.com wrote:
 So I reproduced the problem here:

 == test.sh ==
 #!/bin/bash
 for x in $@; do
   echo arg: $x
 done
 ARGS_COPY=$@
 for x in $ARGS_COPY; do
   echo arg_copy: $x
 done
 ==

 ./test.sh a b c d e f
 arg: a
 arg: b
 arg: c d e
 arg: f
 arg_copy: a b c d e f

 I'll dig around a bit more and see if we can fix it. Pretty sure we
 aren't passing these argument arrays around correctly in bash.

 On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin van...@cloudera.com 
 wrote:
 On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell pwend...@gmail.com 
 wrote:
 Yeah I think the problem is that the spark-submit script doesn't pass
 the argument array to spark-class in the right way, so any quoted
 strings get flattened.

 I think we'll need to figure out how to do this correctly in the bash
 script so that quoted strings get passed in the right way.

 I tried a few different approaches but finally ended up giving up; my
 bash-fu is apparently not strong enough. If you can make it work
 great, but I have -J working locally in case you give up like me.
 :-)

 --
 Marcelo



 --
 Marcelo