GitHub user ifilonenko opened a pull request:

    https://github.com/apache/spark/pull/21092

    [SPARK-23984][K8S][WIP] Initial Python Bindings for PySpark on K8s

    ## What changes were proposed in this pull request?
    
    Introducing Python Bindings for PySpark.
    
    - [ ] Running PySpark Jobs
    - [ ] Increased Default Memory Overhead value
    - [ ] Dependency Management for virtualenv/conda
    
    ## How was this patch tested?
    
    This patch was tested with 
    
    - [ ] Unit Tests
    - [ ] Integration tests with [this 
addition](https://github.com/apache-spark-on-k8s/spark-integration/pull/46)
    ```
    KubernetesSuite:
    - Run SparkPi with no resources
    - Run SparkPi with a very long application name.
    - Run SparkPi with a master URL without a scheme.
    - Run SparkPi with an argument.
    - Run SparkPi with custom labels, annotations, and environment variables.
    - Run SparkPi with a test secret mounted into the driver and executor pods
    - Run extraJVMOptions check on driver
    - Run SparkRemoteFileTest using a remote data file
    - Run PySpark on simple pi.py example
    Run completed in 4 minutes, 3 seconds.
    Total number of tests run: 9
    Suites: completed 2, aborted 0
    Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
    All tests passed.
    ```
    
    ## Problematic Comments from [ifilonenko]
    
    - [ ] Currently Docker image is built with Python2 --> needs to be generic 
for Python2/3
    - [ ] `--py-files` is properly distributing but it seems that example 
commands like
    ```
    exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf 
spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file 
/opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner 
/opt/spark/examples/src/main/python/pi.py 
/opt/spark/examples/src/main/python/sort.py
    ```
    is causing errors of `/opt/spark/examples/src/main/python/pi.py` thinking 
that `/opt/spark/examples/src/main/python/sort.py is an argument
    
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ifilonenko/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21092.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21092
    
----
commit fb5b9ed83d4e5ed73bc44b9d719ac0e52702655e
Author: Ilan Filonenko <if56@...>
Date:   2018-04-16T03:23:43Z

    initial architecture for PySpark w/o dockerfile work

commit b7b3db0abfbf425120fa21cc61e603c5d766f8af
Author: Ilan Filonenko <if56@...>
Date:   2018-04-17T19:13:45Z

    included entrypoint logic

commit 98cef8ceb0f04cfcefbc482c2a0fe39c75f620c4
Author: Ilan Filonenko <if56@...>
Date:   2018-04-18T02:22:55Z

    satisfying integration tests

commit dc670dcd07944ae30b9b425c26250a21986b2699
Author: Ilan Filonenko <if56@...>
Date:   2018-04-18T05:20:12Z

    end-to-end working pyspark

commit eabe4b9b784f37cca3dd9bcff17110944b50f5c8
Author: Ilan Filonenko <ifilondz@...>
Date:   2018-04-18T05:20:42Z

    Merge pull request #1 from ifilonenko/py-spark
    
    Initial architecture for PySpark w/o dependency management

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to