gongwendong created SPARK-52669:
-----------------------------------
Summary: Improvement PySpark run with python directly could not
find correct python exec
Key: SPARK-52669
URL: https://issues.apache.org/jira/browse/SPARK-52669
Project: Spark
Issue Type: Improvement
Components: PySpark
Affects Versions: 4.0.0
Environment: Spark version: the latest (3.3.2+)
OS: Centos
JDK: 8.0.422-kona
Python: 3.10.15
Reporter: gongwendong
Fix For: 4.1.0
Attachments: image-2025-07-03-15-03-16-079.png
* issue information
run in cluster: yarn, deploy mode: client with run.py.
{code:java}
// in run.py python code
SparkSession.builder
.appName('sample on conflict python exec error')
.master('yarn')
.config('spark.submit.pyFiles',emr.project_pack())
.config('spark.ui.enabled','true')
.config('spark.pyspark.driver.python','./environment/bin/python')
.config('spark.pyspark.python','./environment/bin/python')
.config("spark.archives",
f"hdfs:///spark/env/algo_reco_rank.dist.archives.tar.gz#environment")
.enableHiveSupport()
.getOrCreate()
spark.range(1).rdd.map(lambda x: (x, f"Executor Python version: {sys.version}",
f"#Executor Python executable:
{sys.executable}")).collect(){code}
* exception & error:RuntimeError: Python in worker has different version 3.6
than that in driver 3.10, PySpark cannot run with different minor versions.
Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are
correctly set.
!image-2025-07-03-15-02-23-940.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]