[
https://issues.apache.org/jira/browse/SPARK-52669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17990766#comment-17990766
]
gongwendong commented on SPARK-52669:
-------------------------------------
PySpark requires the driver and executors to use compatible Python versions
(same minor version). Without these changes, when running in a Python
development environment or executing Python scripts directly, the driver node
would fail to locate the PYSPARK_PYTHON variable. This would force users to
manually define the PYSPARK_PYTHON environment variable for every script, which
is inconvenient.
> Improvement PySpark run with python directly could not find correct python
> exec
> -------------------------------------------------------------------------------
>
> Key: SPARK-52669
> URL: https://issues.apache.org/jira/browse/SPARK-52669
> Project: Spark
> Issue Type: Improvement
> Components: PySpark
> Affects Versions: 4.0.0
> Environment: Spark version: the latest (3.3.2+)
> OS: Centos
> JDK: 8.0.422-kona
> Python: 3.10.15
> Reporter: gongwendong
> Priority: Minor
> Labels: pull-request-available
> Fix For: 4.1.0
>
> Attachments: image-2025-07-03-15-03-16-079.png
>
>
> * issue information
> run in cluster: yarn, deploy mode: client with run.py.
> {code:java}
> // in run.py python code
> SparkSession.builder
> .appName('sample on conflict python exec error')
> .master('yarn')
> .config('spark.submit.pyFiles',emr.project_pack())
> .config('spark.ui.enabled','true')
> .config('spark.pyspark.driver.python','./environment/bin/python')
> .config('spark.pyspark.python','./environment/bin/python')
> .config("spark.archives",
> f"hdfs:///spark/env/algo_reco_rank.dist.archives.tar.gz#environment")
> .enableHiveSupport()
> .getOrCreate()
> spark.range(1).rdd.map(lambda x: (x, f"Executor Python version:
> {sys.version}",
> f"#Executor Python executable:
> {sys.executable}")).collect(){code}
>
> * exception & error:RuntimeError: Python in worker has different version 3.6
> than that in driver 3.10, PySpark cannot run with different minor versions.
> Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON
> are correctly set.
> !image-2025-07-03-15-03-16-079.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]