[ https://issues.apache.org/jira/browse/SPARK-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheng Lian updated SPARK-2678: ------------------------------ Priority: Major (was: Minor) > `Spark-submit` overrides user application options > ------------------------------------------------- > > Key: SPARK-2678 > URL: https://issues.apache.org/jira/browse/SPARK-2678 > Project: Spark > Issue Type: Bug > Components: Deploy > Affects Versions: 1.0.1, 1.0.2 > Reporter: Cheng Lian > > Here is an example: > {code} > ./bin/spark-submit --class Foo some.jar --help > {code} > SInce {{--help}} appears behind the primary resource (i.e. {{some.jar}}), it > should be recognized as a user application option. But it's actually > overriden by {{spark-submit}} and will show {{spark-submit}} help message. > When directly invoking {{spark-submit}}, the constraints here are: > # Options before primary resource should be recognized as {{spark-submit}} > options > # Options after primary resource should be recognized as user application > options > The tricky part is how to handle scripts like {{spark-shell}} that delegate > {{spark-submit}}. These scripts allow users specify both {{spark-submit}} > options like {{--master}} and user defined application options together. For > example, say we'd like to write a new script {{start-thriftserver.sh}} to > start the Hive Thrift server, basically we may do this: > {code} > $SPARK_HOME/bin/spark-submit --class > org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 spark-internal $@ > {code} > Then user may call this script like: > {code} > ./sbin/start-thriftserver.sh --master spark://some-host:7077 --hiveconf > key=value > {code} > Notice that all options are captured by {{$@}}. If we put it before > {{spark-internal}}, they are all recognized as {{spark-submit}} options, thus > {{--hiveconf}} won't be passed to {{HiveThriftServer2}}; if we put it after > {{spark-internal}}, they *should* all be recognized as options of > {{HiveThriftServer2}}, but because of this bug, {{--master}} is still > recognized as {{spark-submit}} option and leads to the right behavior. > Although currently all scripts using {{spark-submit}} work correctly, we > still should fix this bug, because it causes option name collision between > {{spark-submit}} and user application, and every time we add a new option to > {{spark-submit}}, some existing user applications may break. However, solving > this bug may cause some incompatible changes. > The suggested solution here is using {{--}} as separator of {{spark-submit}} > options and user application options. For the Hive Thrift server example > above, user should call it in this way: > {code} > ./sbin/start-thriftserver.sh --master spark://some-host:7077 -- --hiveconf > key=value > {code} > And {{SparkSubmitArguments}} should be responsible for splitting two sets of > options and pass them correctly. -- This message was sent by Atlassian JIRA (v6.2#6252)