Maxime Nannan created SPARK-25978: ------------------------------------- Summary: Pyspark can only be used in spark-submit in spark-py docker image for kubernetes Key: SPARK-25978 URL: https://issues.apache.org/jira/browse/SPARK-25978 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 2.4.0 Reporter: Maxime Nannan
Currently in spark-py docker image for kubernetes defined by the Dockerfile in resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile, the PYTHONPATH is defined as follows: {code:java} ENV PYTHONPATH ${SPARK_HOME}/python/lib/pyspark.zip:${SPARK_HOME}/python/lib/py4j-*.zip{code} I think that the problem is that PYTHONPATH does not support wildcards so py4j cannot be imported with the default PYTHONPATH and pyspark cannot be imported too as it needs py4j. This does not impact spark-submit of python files because py4j is dynamically added to PYTHONPATH when running python process in core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala. It's not really an issue as the main purpose of that docker image is to be run as driver or executors on k8s but it's worth mentionning this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org