Re: SparkSubmit and --driver-java-options
I added a fix for this recently and it didn't require adding -J notation - are you trying it with this patch? https://issues.apache.org/jira/browse/SPARK-1654 ./bin/spark-shell --driver-java-options -Dfoo=a -Dbar=b scala sys.props.get(foo) res0: Option[String] = Some(a) scala sys.props.get(bar) res1: Option[String] = Some(b) - Patrick On Wed, Apr 30, 2014 at 11:29 AM, Marcelo Vanzin van...@cloudera.com wrote: Hello all, Maybe my brain is not evolved enough to be able to trace through what happens with command-line arguments as they're parsed through all the shell scripts... but I really can't figure out how to pass more than a single JVM option on the command line. Unless someone has an obvious workaround that I'm missing, I'd like to propose something that is actually pretty standard in JVM tools: using -J. From javac: -Jflag Pass flag directly to the runtime system So javac -J-Xmx1g would pass -Xmx1g to the underlying JVM. You can use several of those to pass multiple options (unlike --driver-java-options), so it helps that it's a short syntax. Unless someone has some issue with that I'll work on a patch for it... (well, I'm going to do it locally for me anyway because I really can't figure out how to do what I want to otherwise.) -- Marcelo
Re: SparkSubmit and --driver-java-options
Just pulled again just in case. Verified your fix is there. $ ./bin/spark-submit --master yarn --deploy-mode client --driver-java-options -Dfoo -Dbar blah blah blah error: Unrecognized option '-Dbar'. run with --help for more information or --verbose for debugging output On Wed, Apr 30, 2014 at 12:49 PM, Patrick Wendell pwend...@gmail.com wrote: I added a fix for this recently and it didn't require adding -J notation - are you trying it with this patch? https://issues.apache.org/jira/browse/SPARK-1654 ./bin/spark-shell --driver-java-options -Dfoo=a -Dbar=b scala sys.props.get(foo) res0: Option[String] = Some(a) scala sys.props.get(bar) res1: Option[String] = Some(b) - Patrick On Wed, Apr 30, 2014 at 11:29 AM, Marcelo Vanzin van...@cloudera.com wrote: Hello all, Maybe my brain is not evolved enough to be able to trace through what happens with command-line arguments as they're parsed through all the shell scripts... but I really can't figure out how to pass more than a single JVM option on the command line. Unless someone has an obvious workaround that I'm missing, I'd like to propose something that is actually pretty standard in JVM tools: using -J. From javac: -Jflag Pass flag directly to the runtime system So javac -J-Xmx1g would pass -Xmx1g to the underlying JVM. You can use several of those to pass multiple options (unlike --driver-java-options), so it helps that it's a short syntax. Unless someone has some issue with that I'll work on a patch for it... (well, I'm going to do it locally for me anyway because I really can't figure out how to do what I want to otherwise.) -- Marcelo -- Marcelo
Re: SparkSubmit and --driver-java-options
Yeah I think the problem is that the spark-submit script doesn't pass the argument array to spark-class in the right way, so any quoted strings get flattened. We do: ORIG_ARGS=$@ $SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS This works: // remove all the code relating to `shift`ing the arguments $SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $@ Not sure, but I think the issue is that when you make a copy of $@ in bash the type actually changes from an array to something else. My patch fixes this for spark-shell but I didn't realize that spark-submit does the same thing. https://github.com/apache/spark/pull/576/files#diff-bc287993dfd11fd18794041e169ffd72L23 I think we'll need to figure out how to do this correctly in the bash script so that quoted strings get passed in the right way. On Wed, Apr 30, 2014 at 1:06 PM, Marcelo Vanzin van...@cloudera.com wrote: Just pulled again just in case. Verified your fix is there. $ ./bin/spark-submit --master yarn --deploy-mode client --driver-java-options -Dfoo -Dbar blah blah blah error: Unrecognized option '-Dbar'. run with --help for more information or --verbose for debugging output On Wed, Apr 30, 2014 at 12:49 PM, Patrick Wendell pwend...@gmail.com wrote: I added a fix for this recently and it didn't require adding -J notation - are you trying it with this patch? https://issues.apache.org/jira/browse/SPARK-1654 ./bin/spark-shell --driver-java-options -Dfoo=a -Dbar=b scala sys.props.get(foo) res0: Option[String] = Some(a) scala sys.props.get(bar) res1: Option[String] = Some(b) - Patrick On Wed, Apr 30, 2014 at 11:29 AM, Marcelo Vanzin van...@cloudera.com wrote: Hello all, Maybe my brain is not evolved enough to be able to trace through what happens with command-line arguments as they're parsed through all the shell scripts... but I really can't figure out how to pass more than a single JVM option on the command line. Unless someone has an obvious workaround that I'm missing, I'd like to propose something that is actually pretty standard in JVM tools: using -J. From javac: -Jflag Pass flag directly to the runtime system So javac -J-Xmx1g would pass -Xmx1g to the underlying JVM. You can use several of those to pass multiple options (unlike --driver-java-options), so it helps that it's a short syntax. Unless someone has some issue with that I'll work on a patch for it... (well, I'm going to do it locally for me anyway because I really can't figure out how to do what I want to otherwise.) -- Marcelo -- Marcelo
Re: SparkSubmit and --driver-java-options
On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell pwend...@gmail.com wrote: Yeah I think the problem is that the spark-submit script doesn't pass the argument array to spark-class in the right way, so any quoted strings get flattened. I think we'll need to figure out how to do this correctly in the bash script so that quoted strings get passed in the right way. I tried a few different approaches but finally ended up giving up; my bash-fu is apparently not strong enough. If you can make it work great, but I have -J working locally in case you give up like me. :-) -- Marcelo
Re: SparkSubmit and --driver-java-options
So I reproduced the problem here: == test.sh == #!/bin/bash for x in $@; do echo arg: $x done ARGS_COPY=$@ for x in $ARGS_COPY; do echo arg_copy: $x done == ./test.sh a b c d e f arg: a arg: b arg: c d e arg: f arg_copy: a b c d e f I'll dig around a bit more and see if we can fix it. Pretty sure we aren't passing these argument arrays around correctly in bash. On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin van...@cloudera.com wrote: On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell pwend...@gmail.com wrote: Yeah I think the problem is that the spark-submit script doesn't pass the argument array to spark-class in the right way, so any quoted strings get flattened. I think we'll need to figure out how to do this correctly in the bash script so that quoted strings get passed in the right way. I tried a few different approaches but finally ended up giving up; my bash-fu is apparently not strong enough. If you can make it work great, but I have -J working locally in case you give up like me. :-) -- Marcelo
Re: SparkSubmit and --driver-java-options
Try this: #!/bin/bash for x in $@; do echo arg: $x done ARGS_COPY=($@) # Make ARGS_COPY an array with the array elements in $@ for x in ${ARGS_COPY[@]}; do# preserve array arguments. echo arg_copy: $x done On Wed, Apr 30, 2014 at 3:51 PM, Patrick Wendell pwend...@gmail.com wrote: So I reproduced the problem here: == test.sh == #!/bin/bash for x in $@; do echo arg: $x done ARGS_COPY=$@ for x in $ARGS_COPY; do echo arg_copy: $x done == ./test.sh a b c d e f arg: a arg: b arg: c d e arg: f arg_copy: a b c d e f I'll dig around a bit more and see if we can fix it. Pretty sure we aren't passing these argument arrays around correctly in bash. On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin van...@cloudera.com wrote: On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell pwend...@gmail.com wrote: Yeah I think the problem is that the spark-submit script doesn't pass the argument array to spark-class in the right way, so any quoted strings get flattened. I think we'll need to figure out how to do this correctly in the bash script so that quoted strings get passed in the right way. I tried a few different approaches but finally ended up giving up; my bash-fu is apparently not strong enough. If you can make it work great, but I have -J working locally in case you give up like me. :-) -- Marcelo -- Dean Wampler, Ph.D. Typesafe @deanwampler http://typesafe.com http://polyglotprogramming.com
Re: SparkSubmit and --driver-java-options
Marcelo - Mind trying the following diff locally? If it works I can send a patch: patrick@patrick-t430s:~/Documents/spark$ git diff bin/spark-submit diff --git a/bin/spark-submit b/bin/spark-submit index dd0d95d..49bc262 100755 --- a/bin/spark-submit +++ b/bin/spark-submit @@ -18,7 +18,7 @@ # export SPARK_HOME=$(cd `dirname $0`/..; pwd) -ORIG_ARGS=$@ +ORIG_ARGS=($@) while (($#)); do if [ $1 = --deploy-mode ]; then @@ -39,5 +39,5 @@ if [ ! -z $DRIVER_MEMORY ] [ ! -z $DEPLOY_MODE ] [ $DEPLOY_MODE = client export SPARK_MEM=$DRIVER_MEMORY fi -$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS +$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit ${ORIG_ARGS[@]} On Wed, Apr 30, 2014 at 1:51 PM, Patrick Wendell pwend...@gmail.com wrote: So I reproduced the problem here: == test.sh == #!/bin/bash for x in $@; do echo arg: $x done ARGS_COPY=$@ for x in $ARGS_COPY; do echo arg_copy: $x done == ./test.sh a b c d e f arg: a arg: b arg: c d e arg: f arg_copy: a b c d e f I'll dig around a bit more and see if we can fix it. Pretty sure we aren't passing these argument arrays around correctly in bash. On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin van...@cloudera.com wrote: On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell pwend...@gmail.com wrote: Yeah I think the problem is that the spark-submit script doesn't pass the argument array to spark-class in the right way, so any quoted strings get flattened. I think we'll need to figure out how to do this correctly in the bash script so that quoted strings get passed in the right way. I tried a few different approaches but finally ended up giving up; my bash-fu is apparently not strong enough. If you can make it work great, but I have -J working locally in case you give up like me. :-) -- Marcelo
Re: SparkSubmit and --driver-java-options
Cool, that seems to work. Thanks! On Wed, Apr 30, 2014 at 2:09 PM, Patrick Wendell pwend...@gmail.com wrote: Marcelo - Mind trying the following diff locally? If it works I can send a patch: patrick@patrick-t430s:~/Documents/spark$ git diff bin/spark-submit diff --git a/bin/spark-submit b/bin/spark-submit index dd0d95d..49bc262 100755 --- a/bin/spark-submit +++ b/bin/spark-submit @@ -18,7 +18,7 @@ # export SPARK_HOME=$(cd `dirname $0`/..; pwd) -ORIG_ARGS=$@ +ORIG_ARGS=($@) while (($#)); do if [ $1 = --deploy-mode ]; then @@ -39,5 +39,5 @@ if [ ! -z $DRIVER_MEMORY ] [ ! -z $DEPLOY_MODE ] [ $DEPLOY_MODE = client export SPARK_MEM=$DRIVER_MEMORY fi -$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS +$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit ${ORIG_ARGS[@]} On Wed, Apr 30, 2014 at 1:51 PM, Patrick Wendell pwend...@gmail.com wrote: So I reproduced the problem here: == test.sh == #!/bin/bash for x in $@; do echo arg: $x done ARGS_COPY=$@ for x in $ARGS_COPY; do echo arg_copy: $x done == ./test.sh a b c d e f arg: a arg: b arg: c d e arg: f arg_copy: a b c d e f I'll dig around a bit more and see if we can fix it. Pretty sure we aren't passing these argument arrays around correctly in bash. On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin van...@cloudera.com wrote: On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell pwend...@gmail.com wrote: Yeah I think the problem is that the spark-submit script doesn't pass the argument array to spark-class in the right way, so any quoted strings get flattened. I think we'll need to figure out how to do this correctly in the bash script so that quoted strings get passed in the right way. I tried a few different approaches but finally ended up giving up; my bash-fu is apparently not strong enough. If you can make it work great, but I have -J working locally in case you give up like me. :-) -- Marcelo -- Marcelo
Re: SparkSubmit and --driver-java-options
Dean - our e-mails crossed, but thanks for the tip. Was independently arriving at your solution :) Okay I'll submit something. - Patrick On Wed, Apr 30, 2014 at 2:14 PM, Marcelo Vanzin van...@cloudera.com wrote: Cool, that seems to work. Thanks! On Wed, Apr 30, 2014 at 2:09 PM, Patrick Wendell pwend...@gmail.com wrote: Marcelo - Mind trying the following diff locally? If it works I can send a patch: patrick@patrick-t430s:~/Documents/spark$ git diff bin/spark-submit diff --git a/bin/spark-submit b/bin/spark-submit index dd0d95d..49bc262 100755 --- a/bin/spark-submit +++ b/bin/spark-submit @@ -18,7 +18,7 @@ # export SPARK_HOME=$(cd `dirname $0`/..; pwd) -ORIG_ARGS=$@ +ORIG_ARGS=($@) while (($#)); do if [ $1 = --deploy-mode ]; then @@ -39,5 +39,5 @@ if [ ! -z $DRIVER_MEMORY ] [ ! -z $DEPLOY_MODE ] [ $DEPLOY_MODE = client export SPARK_MEM=$DRIVER_MEMORY fi -$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS +$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit ${ORIG_ARGS[@]} On Wed, Apr 30, 2014 at 1:51 PM, Patrick Wendell pwend...@gmail.com wrote: So I reproduced the problem here: == test.sh == #!/bin/bash for x in $@; do echo arg: $x done ARGS_COPY=$@ for x in $ARGS_COPY; do echo arg_copy: $x done == ./test.sh a b c d e f arg: a arg: b arg: c d e arg: f arg_copy: a b c d e f I'll dig around a bit more and see if we can fix it. Pretty sure we aren't passing these argument arrays around correctly in bash. On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin van...@cloudera.com wrote: On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell pwend...@gmail.com wrote: Yeah I think the problem is that the spark-submit script doesn't pass the argument array to spark-class in the right way, so any quoted strings get flattened. I think we'll need to figure out how to do this correctly in the bash script so that quoted strings get passed in the right way. I tried a few different approaches but finally ended up giving up; my bash-fu is apparently not strong enough. If you can make it work great, but I have -J working locally in case you give up like me. :-) -- Marcelo -- Marcelo
Re: SparkSubmit and --driver-java-options
Patch here: https://github.com/apache/spark/pull/609 On Wed, Apr 30, 2014 at 2:26 PM, Patrick Wendell pwend...@gmail.com wrote: Dean - our e-mails crossed, but thanks for the tip. Was independently arriving at your solution :) Okay I'll submit something. - Patrick On Wed, Apr 30, 2014 at 2:14 PM, Marcelo Vanzin van...@cloudera.com wrote: Cool, that seems to work. Thanks! On Wed, Apr 30, 2014 at 2:09 PM, Patrick Wendell pwend...@gmail.com wrote: Marcelo - Mind trying the following diff locally? If it works I can send a patch: patrick@patrick-t430s:~/Documents/spark$ git diff bin/spark-submit diff --git a/bin/spark-submit b/bin/spark-submit index dd0d95d..49bc262 100755 --- a/bin/spark-submit +++ b/bin/spark-submit @@ -18,7 +18,7 @@ # export SPARK_HOME=$(cd `dirname $0`/..; pwd) -ORIG_ARGS=$@ +ORIG_ARGS=($@) while (($#)); do if [ $1 = --deploy-mode ]; then @@ -39,5 +39,5 @@ if [ ! -z $DRIVER_MEMORY ] [ ! -z $DEPLOY_MODE ] [ $DEPLOY_MODE = client export SPARK_MEM=$DRIVER_MEMORY fi -$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS +$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit ${ORIG_ARGS[@]} On Wed, Apr 30, 2014 at 1:51 PM, Patrick Wendell pwend...@gmail.com wrote: So I reproduced the problem here: == test.sh == #!/bin/bash for x in $@; do echo arg: $x done ARGS_COPY=$@ for x in $ARGS_COPY; do echo arg_copy: $x done == ./test.sh a b c d e f arg: a arg: b arg: c d e arg: f arg_copy: a b c d e f I'll dig around a bit more and see if we can fix it. Pretty sure we aren't passing these argument arrays around correctly in bash. On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin van...@cloudera.com wrote: On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell pwend...@gmail.com wrote: Yeah I think the problem is that the spark-submit script doesn't pass the argument array to spark-class in the right way, so any quoted strings get flattened. I think we'll need to figure out how to do this correctly in the bash script so that quoted strings get passed in the right way. I tried a few different approaches but finally ended up giving up; my bash-fu is apparently not strong enough. If you can make it work great, but I have -J working locally in case you give up like me. :-) -- Marcelo -- Marcelo