Marcelo - Mind trying the following diff locally? If it works I can send a patch:
patrick@patrick-t430s:~/Documents/spark$ git diff bin/spark-submit diff --git a/bin/spark-submit b/bin/spark-submit index dd0d95d..49bc262 100755 --- a/bin/spark-submit +++ b/bin/spark-submit @@ -18,7 +18,7 @@ # export SPARK_HOME="$(cd `dirname $0`/..; pwd)" -ORIG_ARGS=$@ +ORIG_ARGS=("$@") while (($#)); do if [ "$1" = "--deploy-mode" ]; then @@ -39,5 +39,5 @@ if [ ! -z $DRIVER_MEMORY ] && [ ! -z $DEPLOY_MODE ] && [ $DEPLOY_MODE = "client" export SPARK_MEM=$DRIVER_MEMORY fi -$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit $ORIG_ARGS +$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit "${ORIG_ARGS[@]}" On Wed, Apr 30, 2014 at 1:51 PM, Patrick Wendell <pwend...@gmail.com> wrote: > So I reproduced the problem here: > > == test.sh == > #!/bin/bash > for x in "$@"; do > echo "arg: $x" > done > ARGS_COPY=$@ > for x in "$ARGS_COPY"; do > echo "arg_copy: $x" > done > == > > ./test.sh a b "c d e" f > arg: a > arg: b > arg: c d e > arg: f > arg_copy: a b c d e f > > I'll dig around a bit more and see if we can fix it. Pretty sure we > aren't passing these argument arrays around correctly in bash. > > On Wed, Apr 30, 2014 at 1:48 PM, Marcelo Vanzin <van...@cloudera.com> wrote: >> On Wed, Apr 30, 2014 at 1:41 PM, Patrick Wendell <pwend...@gmail.com> wrote: >>> Yeah I think the problem is that the spark-submit script doesn't pass >>> the argument array to spark-class in the right way, so any quoted >>> strings get flattened. >>> >>> I think we'll need to figure out how to do this correctly in the bash >>> script so that quoted strings get passed in the right way. >> >> I tried a few different approaches but finally ended up giving up; my >> bash-fu is apparently not strong enough. If you can make it work >> great, but I have "-J" working locally in case you give up like me. >> :-) >> >> -- >> Marcelo