Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/19840
  
    @yaooqinn OK, I see the situation.
    
    In client mode, I think we can't use `spark.yarn.appMasterEnv.XXX` which is 
for cluster mode. So we should use environment variable `PYSPARK_PYTHON` or 
`PYSPARK_DRIVER_PYTHON`, or corresponding spark conf, `spark.pyspark.python`, 
`spark.pyspark.driver.python`.
    
    In cluster mode, we can use `spark.yarn.appMasterEnv.XXX` and if there 
exist `spark.yarn.appMasterEnv.PYSPARK_PYTHON` or 
`spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON`, they overwrite original 
environment variables.
    
    Btw, `PYSPARK_DRIVER_PYTHON` is for only Driver, not Executors, so we 
should handle only `PYSPARK_PYTHON` in executor and the priority of 
`PYSPARK_DRIVER_PYTHON` is higher than `PYSPARK_PYTHON` in Driver.
    
    Currently we handle only environment varibale but not 
`spark.executorEnv.PYSPARK_PYTHON` for executor so we should handle it at 
`api/python/PythonRunner` as you do now or 
[context.py#L191](https://github.com/yaooqinn/spark/blob/8ff5663fe9a32eae79c8ee6bc310409170a8da64/python/pyspark/context.py#L191).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to