gongwendong created SPARK-52669:
-----------------------------------

             Summary: Improvement PySpark run with python directly could not 
find correct python exec
                 Key: SPARK-52669
                 URL: https://issues.apache.org/jira/browse/SPARK-52669
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
    Affects Versions: 4.0.0
         Environment: Spark version: the latest (3.3.2+)

OS: Centos

JDK: 8.0.422-kona

Python:  3.10.15
            Reporter: gongwendong
             Fix For: 4.1.0
         Attachments: image-2025-07-03-15-03-16-079.png

* issue information

run in cluster: yarn, deploy mode: client with run.py. 
{code:java}
// in run.py python code
SparkSession.builder
        .appName('sample on conflict python exec error')
        .master('yarn')
        .config('spark.submit.pyFiles',emr.project_pack())
        .config('spark.ui.enabled','true')
        .config('spark.pyspark.driver.python','./environment/bin/python')
        .config('spark.pyspark.python','./environment/bin/python')
        .config("spark.archives", 
f"hdfs:///spark/env/algo_reco_rank.dist.archives.tar.gz#environment")
        .enableHiveSupport()
        .getOrCreate() 
spark.range(1).rdd.map(lambda x: (x, f"Executor Python version: {sys.version}",
                                      f"#Executor Python executable: 
{sys.executable}")).collect(){code}
 
 * exception & error:RuntimeError: Python in worker has different version 3.6 
than that in driver 3.10, PySpark cannot run with different minor versions. 
Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are 
correctly set.

!image-2025-07-03-15-02-23-940.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to