Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19840 @yaooqinn OK, I see the situation. In client mode, I think we can't use `spark.yarn.appMasterEnv.XXX` which is for cluster mode. So we should use environment variable `PYSPARK_PYTHON` or `PYSPARK_DRIVER_PYTHON`, or corresponding spark conf, `spark.pyspark.python`, `spark.pyspark.driver.python`. In cluster mode, we can use `spark.yarn.appMasterEnv.XXX` and if there exist `spark.yarn.appMasterEnv.PYSPARK_PYTHON` or `spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON`, they overwrite original environment variables. Btw, `PYSPARK_DRIVER_PYTHON` is for only Driver, not Executors, so we should handle only `PYSPARK_PYTHON` in executor and the priority of `PYSPARK_DRIVER_PYTHON` is higher than `PYSPARK_PYTHON` in Driver. Currently we handle only environment varibale but not `spark.executorEnv.PYSPARK_PYTHON` for executor so we should handle it at `api/python/PythonRunner` as you do now or [context.py#L191](https://github.com/yaooqinn/spark/blob/8ff5663fe9a32eae79c8ee6bc310409170a8da64/python/pyspark/context.py#L191).
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org