[jira] [Commented] (SPARK-26237) [K8s] Unable to switch python version in executor when running pyspark client

Qi Shao (JIRA) Fri, 30 Nov 2018 13:02:43 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-26237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705275#comment-16705275
 ]


Qi Shao commented on SPARK-26237:
---------------------------------

Figured out that if pyspark configs needs to be done before running pyspark. 
Creating a new sparkSession after logging into console won't work.

> [K8s] Unable to switch python version in executor when running pyspark client
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-26237
>                 URL: https://issues.apache.org/jira/browse/SPARK-26237
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.4.0
>         Environment: Spark 2.4.0
> Google Kubernetes Engines
>            Reporter: Qi Shao
>            Priority: Major
>
> Error message:
> {code:java}
> Exception: Python in worker has different version 2.7 than that in driver 
> 3.6, PySpark cannot run with different minor versions.Please check 
> environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly 
> set.{code}
> Neither
> {code:java}
> spark.kubernetes.pyspark.pythonVersion{code}
> nor 
> {code:java}
> spark.executorEnv.PYSPARK_MAJOR_PYTHON_VERSION {code}
> works.
> This happens when I'm running a Notebook with pyspark+python3 and also in a 
> pod which has pyspark+python3.
> For notebook, the code is:
> {code:java}
> ```
> from _future_ import print_function
> import sys
> from random import random
> from operator import add
> from pyspark.sql import SparkSession
> spark = SparkSession.builder\
>  .master("k8s://https://kubernetes.default.svc";)\
>  .appName("PySpark Testout")\
>  .config("spark.submit.deployMode","client")\
>  .config("spark.executor.instances", "2")\
>  .config("spark.kubernetes.container.image","azureq/pantheon:pyspark-2.4")\
>  .config("spark.driver.host","jupyter-notebook-headless")\
>  .config("spark.driver.pod.name","jupyter-notebook-headless")\
>  .config("spark.kubernetes.authenticate.driver.serviceAccountName","spark")\
>  .config("spark.kubernetes.pyspark.pythonVersion","3")\
>  .config("spark.executorEnv.PYSPARK_MAJOR_PYTHON_VERSION","3")\
>  .getOrCreate()
> n = 100000
> def f(_):
>     x = random() * 2 - 1
>     y = random() * 2 - 1
>     return 1 if x ** 2 + y ** 2 <= 1 else 0
> count = spark.sparkContext.parallelize(range(1, n + 1), 
> partitions).map(f).reduce(add)
> print("Pi is roughly %f" % (4.0 * count / n))
> {code}
>  For pyspark shell, the command is:
>  
> {code:java}
> $SPARK_HOME/bin/pyspark --master \ 
> k8s://https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_SERVICE_PORT_HTTPS \
>  --deploy-mode client \
>  --conf spark.executor.instances=5 \
>  --conf spark.kubernetes.container.image=azureq/pantheon:pyspark-2.4 \
>  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
>  --conf spark.driver.host=spark-client-mode-headless \
>  --conf spark.kubernetes.pyspark.pythonVersion=3 \
>  --conf spark.driver.pod.name=spark-client-mode-headless{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26237) [K8s] Unable to switch python version in executor when running pyspark client

Reply via email to