[GitHub] spark issue #19840: [SPARK-22640][PYSPARK][YARN]switch python exec on execut...

yaooqinn Sun, 03 Dec 2017 21:01:00 -0800

Github user yaooqinn commented on the issue:

    https://github.com/apache/spark/pull/19840
  
    #### use spark-2.2.0-bin-hadoop2.7 numpy 
examples/src/main/python/mllib/correlations_example.py
    ### case 1 
    |key|value|
    |---|---|
    |**PYSPARK_DRIVER_PYTHON**|~/anaconda3/envs/py3/bin/python|
    |deploy-mode|**client**|
    |--archives |~/anaconda3/envs/py3.zip|
    |spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python|
    |spark.executorEnv.PYSPARK_PYTHON| py3.zip/py3/bin/python |
    |failure|Exception: Python in worker has different version 2.7 than that in 
driver 3.6, PySpark cannot run with different minor versions.Please check 
environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly 
set.|
    
    ### case 2
    
    |key|value|
    |---|---|
    |**PYSPARK_DRIVER_PYTHON**|~/anaconda3/envs/py3/bin/python|
    |deploy-mode|**cluster**|
    |--archives |~/anaconda3/envs/py3.zip|
    |spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python|
    |spark.executorEnv.PYSPARK_PYTHON| py3.zip/py3/bin/python |
    |failure|java.io.IOException: Cannot run program 
"/home/hadoop/anaconda3/envs/py3/bin/python": error=2, No such file or 
directory at 
org.apache.spark.**deploy.PythonRunner**$.main(PythonRunner.scala:91)|
    
    ### case 3 & 4
    
    |key|value|
    |---|---|
    |**PYSPARK_DRIVER_PYTHON**|~/anaconda3/envs/py3/bin/python|
    |deploy-mode|**cluster(3) client (4)**|
    |--archives |~/anaconda3/envs/py3.zip|
    |spark.yarn.appMasterEnv.**PYSPARK_DRIVER_PYTHON**|py3.zip/py3/bin/python|
    |spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python |
    |failure|Exception: Python in worker has different version 2.7 than that in 
driver 3.6, PySpark cannot run with different minor versions.Please check 
environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly 
set.|
    
    ### case 5 && 6 
    
    |key|value|
    |---|---|
    |**PYSPARK_PYTHON**|~/anaconda3/envs/py3/bin/python|
    |deploy-mode|**cluster(6)**|
    |--archives |~/anaconda3/envs/py3.zip|
    |spark.yarn.appMasterEnv.**PYSPARK_DRIVER_PYTHON**|py3.zip/py3/bin/python|
    |spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python |
    |failure|java.io.IOException: Cannot run program 
"/home/hadoop/anaconda3/envs/py3/bin/python": error=2, No such file or 
directory [**executor side PythonRunner**]|
    
    ### case 7
    |key|value|
    |---|---|
    |**PYSPARK_PYTHON**|~/anaconda3/envs/py3/bin/python|
    |deploy-mode|**cluster**|
    |--archives |~/anaconda3/envs/py3.zip|
    |spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python|
    |spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python |
    |**success **|--|
    
    ### case 8
    |key|value|
    |---|---|
    |**PYSPARK_PYTHON**|~/anaconda3/envs/py3/bin/python|
    |deploy-mode|**cluster**|
    |--archives |~/anaconda3/envs/py3.zip|
    |spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python|
    |spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python |
    |failure|java.io.IOException: Cannot run program 
"/home/hadoop/anaconda3/envs/py3/bin/python": error=2, No such file or 
directory [**executor side PythonRunner**]|
    
    
    ### case 9
    |key|value|
    |---|---|
    |not setting~~PYSPARK_[DRIVER]_PYTHON~~|<empty>|
    |deploy-mode|**client**|
    |--archives |~/anaconda3/envs/py3.zip|
    |spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python|
    |spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python |
    |failure|ImportError: No module named numpy|
    
    ### case 10
    |key|value|
    |---|---|
    |not setting~~PYSPARK_[DRIVER]_PYTHON~~|<empty>|
    |deploy-mode|**cluster**|
    |--archives |~/anaconda3/envs/py3.zip|
    |spark.yarn.appMasterEnv.**PYSPARK_PYTHON**|py3.zip/py3/bin/python|
    |spark.executorEnv. PYSPARK_DRIVER_PYTHON | py3.zip/py3/bin/python |
    |**success**| -- |
    
    
    ### my humble opinions
    1. spark.executorEnv. PYSPARK_* takes no affect on executor side 
pythonExec, which is determined by driver.
    2. if PYSPARK_PYTHON is specified then **spark.yarn.appMasterEnv.** should 
be suffixed  by **PYSPARK_PYTHON** not ~~PYSPARK_DRIVER_PYTHON~~
    3. specifying PYSPARK_DRIVER_PYTHON fails all the cases, it may be caused 
by  
https://github.com/yaooqinn/spark/blob/8ff5663fe9a32eae79c8ee6bc310409170a8da64/python/pyspark/context.py#L191
 only deal with PYSPARK_PYTHON




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19840: [SPARK-22640][PYSPARK][YARN]switch python exec on execut...

Reply via email to