Maxime Nannan created SPARK-25978:
-------------------------------------

             Summary: Pyspark can only be used in spark-submit in spark-py 
docker image for kubernetes
                 Key: SPARK-25978
                 URL: https://issues.apache.org/jira/browse/SPARK-25978
             Project: Spark
          Issue Type: Bug
          Components: Kubernetes
    Affects Versions: 2.4.0
            Reporter: Maxime Nannan


Currently in spark-py docker image for kubernetes defined by the Dockerfile in 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile,
 the PYTHONPATH is defined as follows: 
{code:java}
ENV PYTHONPATH 
${SPARK_HOME}/python/lib/pyspark.zip:${SPARK_HOME}/python/lib/py4j-*.zip{code}
I think that the problem is that PYTHONPATH does not support wildcards so py4j 
cannot be imported with the default PYTHONPATH and pyspark cannot be imported 
too as it needs py4j.
This does not impact spark-submit of python files because py4j is dynamically 
added to PYTHONPATH when running python process in 
core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala.

 

It's not really an issue as the main purpose of that docker image is to be run 
as driver or executors on k8s but it's worth mentionning this.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to