Github user yaooqinn commented on the issue: https://github.com/apache/spark/pull/19840 #### use spark-2.2.0-bin-hadoop2.7 numpy examples/src/main/python/mllib/correlations_example.py ### case 1 |key|value| |---|---| |**PYSPARK_DRIVER_PYTHON**|~/anaconda3/envs/py3/bin/python| |deploy-mode|**client**| |--archives |~/anaconda3/envs/py3.zip| |spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python| |spark.executorEnv.PYSPARK_PYTHON| py3.zip/py3/bin/python | |failure|Exception: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.| ### case 2 |key|value| |---|---| |**PYSPARK_DRIVER_PYTHON**|~/anaconda3/envs/py3/bin/python| |deploy-mode|**cluster**| |--archives |~/anaconda3/envs/py3.zip| |spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python| |spark.executorEnv.PYSPARK_PYTHON| py3.zip/py3/bin/python | |failure|java.io.IOException: Cannot run program "/home/hadoop/anaconda3/envs/py3/bin/python": error=2, No such file or directory at org.apache.spark.**deploy.PythonRunner**$.main(PythonRunner.scala:91)| ### case 3 & 4 |key|value| |---|---| |**PYSPARK_DRIVER_PYTHON**|~/anaconda3/envs/py3/bin/python| |deploy-mode|**cluster(3) client (4)**| |--archives |~/anaconda3/envs/py3.zip| |spark.yarn.appMasterEnv.**PYSPARK_DRIVER_PYTHON**|py3.zip/py3/bin/python| |spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python | |failure|Exception: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.| ### case 5 && 6 |key|value| |---|---| |**PYSPARK_PYTHON**|~/anaconda3/envs/py3/bin/python| |deploy-mode|**cluster(6)**| |--archives |~/anaconda3/envs/py3.zip| |spark.yarn.appMasterEnv.**PYSPARK_DRIVER_PYTHON**|py3.zip/py3/bin/python| |spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python | |failure|java.io.IOException: Cannot run program "/home/hadoop/anaconda3/envs/py3/bin/python": error=2, No such file or directory [**executor side PythonRunner**]| ### case 7 |key|value| |---|---| |**PYSPARK_PYTHON**|~/anaconda3/envs/py3/bin/python| |deploy-mode|**cluster**| |--archives |~/anaconda3/envs/py3.zip| |spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python| |spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python | |**success **|--| ### case 8 |key|value| |---|---| |**PYSPARK_PYTHON**|~/anaconda3/envs/py3/bin/python| |deploy-mode|**cluster**| |--archives |~/anaconda3/envs/py3.zip| |spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python| |spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python | |failure|java.io.IOException: Cannot run program "/home/hadoop/anaconda3/envs/py3/bin/python": error=2, No such file or directory [**executor side PythonRunner**]| ### case 9 |key|value| |---|---| |not setting~~PYSPARK_[DRIVER]_PYTHON~~|<empty>| |deploy-mode|**client**| |--archives |~/anaconda3/envs/py3.zip| |spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python| |spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python | |failure|ImportError: No module named numpy| ### case 10 |key|value| |---|---| |not setting~~PYSPARK_[DRIVER]_PYTHON~~|<empty>| |deploy-mode|**cluster**| |--archives |~/anaconda3/envs/py3.zip| |spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python| |spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python | |**success**| -- | ### my humble opinions 1. spark.executorEnv. PYSPARK_* takes no affect on executor side pythonExec, which is determined by driver. 2. if PYSPARK_PYTHON is specified then **spark.yarn.appMasterEnv.** should be suffixed by **PYSPARK_PYTHON** not ~~PYSPARK_DRIVER_PYTHON~~ 3. specifying PYSPARK_DRIVER_PYTHON fails all the cases, it may be caused by https://github.com/yaooqinn/spark/blob/8ff5663fe9a32eae79c8ee6bc310409170a8da64/python/pyspark/context.py#L191 only deal with PYSPARK_PYTHON
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org