GitHub user ifilonenko opened a pull request: https://github.com/apache/spark/pull/21092
[SPARK-23984][K8S][WIP] Initial Python Bindings for PySpark on K8s ## What changes were proposed in this pull request? Introducing Python Bindings for PySpark. - [ ] Running PySpark Jobs - [ ] Increased Default Memory Overhead value - [ ] Dependency Management for virtualenv/conda ## How was this patch tested? This patch was tested with - [ ] Unit Tests - [ ] Integration tests with [this addition](https://github.com/apache-spark-on-k8s/spark-integration/pull/46) ``` KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - Run SparkPi with a test secret mounted into the driver and executor pods - Run extraJVMOptions check on driver - Run SparkRemoteFileTest using a remote data file - Run PySpark on simple pi.py example Run completed in 4 minutes, 3 seconds. Total number of tests run: 9 Suites: completed 2, aborted 0 Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0 All tests passed. ``` ## Problematic Comments from [ifilonenko] - [ ] Currently Docker image is built with Python2 --> needs to be generic for Python2/3 - [ ] `--py-files` is properly distributing but it seems that example commands like ``` exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner /opt/spark/examples/src/main/python/pi.py /opt/spark/examples/src/main/python/sort.py ``` is causing errors of `/opt/spark/examples/src/main/python/pi.py` thinking that `/opt/spark/examples/src/main/python/sort.py is an argument You can merge this pull request into a Git repository by running: $ git pull https://github.com/ifilonenko/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21092.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21092 ---- commit fb5b9ed83d4e5ed73bc44b9d719ac0e52702655e Author: Ilan Filonenko <if56@...> Date: 2018-04-16T03:23:43Z initial architecture for PySpark w/o dockerfile work commit b7b3db0abfbf425120fa21cc61e603c5d766f8af Author: Ilan Filonenko <if56@...> Date: 2018-04-17T19:13:45Z included entrypoint logic commit 98cef8ceb0f04cfcefbc482c2a0fe39c75f620c4 Author: Ilan Filonenko <if56@...> Date: 2018-04-18T02:22:55Z satisfying integration tests commit dc670dcd07944ae30b9b425c26250a21986b2699 Author: Ilan Filonenko <if56@...> Date: 2018-04-18T05:20:12Z end-to-end working pyspark commit eabe4b9b784f37cca3dd9bcff17110944b50f5c8 Author: Ilan Filonenko <ifilondz@...> Date: 2018-04-18T05:20:42Z Merge pull request #1 from ifilonenko/py-spark Initial architecture for PySpark w/o dependency management ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org