[ https://issues.apache.org/jira/browse/SPARK-28095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Emma Dickson updated SPARK-28095: --------------------------------- Description: When passing in arguments to a bash script that sets up spark submit using a python file that sets up a pyspark context strings with spaces are processed as individual strings. This occurs even when the argument is encased in double quotes, using backslashes or unicode escape characters. Example Command entered: This uses and IBM specific driver hence the cos url {code:java} ./scripts/spark-k8s.sh v0.0.32 --job-args "cos://waas-logentries.mycos/Logentries/IBM-b634032e/Github/Load Balancer" --job pages{code} Error Message {code:java} + exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.30.83.253 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner /opt/spark/work-dir/main.py --job-args cos://waas-logentries.mycos/Logentries/IBM-b634032e/Github/Load Balancer --job pages 19/06/18 19:28:35 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS. 19/06/18 19:28:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable usage: main.py [-h] --job JOB --job-args JOB_ARGS main.py: error: unrecognized arguments: Balancer {code} was: When passing in arguments to a bash script that sets up spark submit using a python file that sets up a pyspark context strings with spaces are processed as individual strings. This occurs even when the argument is encased in double quotes, using backslashes or unicode escape characters. Example Command entered {code:java} ./scripts/spark-k8s.sh v0.0.32 --job-args "cos://waas-logentries.mycos/Logentries/IBM-b634032e/Github/Load Balancer" --job pages{code} Error Message {code:java} + exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.30.83.253 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner /opt/spark/work-dir/main.py --job-args cos://waas-logentries.mycos/Logentries/IBM-b634032e/Github/Load Balancer --job pages 19/06/18 19:28:35 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS. 19/06/18 19:28:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable usage: main.py [-h] --job JOB --job-args JOB_ARGS main.py: error: unrecognized arguments: Balancer {code} > Pyspark with kubernetes doesn't parse arguments with spaces as expected. > ------------------------------------------------------------------------ > > Key: SPARK-28095 > URL: https://issues.apache.org/jira/browse/SPARK-28095 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark > Affects Versions: 2.4.3 > Environment: Python 2.7.13 > Spark 2.4.3 > Kubernetes > > Reporter: Emma Dickson > Priority: Minor > Labels: newbie, usability > > When passing in arguments to a bash script that sets up spark submit using a > python file that sets up a pyspark context strings with spaces are processed > as individual strings. This occurs even when the argument is encased in > double quotes, using backslashes or unicode escape characters. > > Example > Command entered: This uses and IBM specific driver hence the cos url > {code:java} > ./scripts/spark-k8s.sh v0.0.32 --job-args > "cos://waas-logentries.mycos/Logentries/IBM-b634032e/Github/Load Balancer" > --job pages{code} > > Error Message > > {code:java} > + exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress=172.30.83.253 --deploy-mode client --properties-file > /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner > /opt/spark/work-dir/main.py --job-args > cos://waas-logentries.mycos/Logentries/IBM-b634032e/Github/Load Balancer > --job pages > 19/06/18 19:28:35 WARN Utils: Kubernetes master URL uses HTTP instead of > HTTPS. > 19/06/18 19:28:36 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > usage: main.py [-h] --job JOB --job-args JOB_ARGS > main.py: error: unrecognized arguments: Balancer > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org