[ https://issues.apache.org/jira/browse/SPARK-26237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705275#comment-16705275 ]
Qi Shao commented on SPARK-26237: --------------------------------- Figured out that if pyspark configs needs to be done before running pyspark. Creating a new sparkSession after logging into console won't work. > [K8s] Unable to switch python version in executor when running pyspark client > ----------------------------------------------------------------------------- > > Key: SPARK-26237 > URL: https://issues.apache.org/jira/browse/SPARK-26237 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.4.0 > Environment: Spark 2.4.0 > Google Kubernetes Engines > Reporter: Qi Shao > Priority: Major > > Error message: > {code:java} > Exception: Python in worker has different version 2.7 than that in driver > 3.6, PySpark cannot run with different minor versions.Please check > environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly > set.{code} > Neither > {code:java} > spark.kubernetes.pyspark.pythonVersion{code} > nor > {code:java} > spark.executorEnv.PYSPARK_MAJOR_PYTHON_VERSION {code} > works. > This happens when I'm running a Notebook with pyspark+python3 and also in a > pod which has pyspark+python3. > For notebook, the code is: > {code:java} > ``` > from _future_ import print_function > import sys > from random import random > from operator import add > from pyspark.sql import SparkSession > spark = SparkSession.builder\ > .master("k8s://https://kubernetes.default.svc")\ > .appName("PySpark Testout")\ > .config("spark.submit.deployMode","client")\ > .config("spark.executor.instances", "2")\ > .config("spark.kubernetes.container.image","azureq/pantheon:pyspark-2.4")\ > .config("spark.driver.host","jupyter-notebook-headless")\ > .config("spark.driver.pod.name","jupyter-notebook-headless")\ > .config("spark.kubernetes.authenticate.driver.serviceAccountName","spark")\ > .config("spark.kubernetes.pyspark.pythonVersion","3")\ > .config("spark.executorEnv.PYSPARK_MAJOR_PYTHON_VERSION","3")\ > .getOrCreate() > n = 100000 > def f(_): > x = random() * 2 - 1 > y = random() * 2 - 1 > return 1 if x ** 2 + y ** 2 <= 1 else 0 > count = spark.sparkContext.parallelize(range(1, n + 1), > partitions).map(f).reduce(add) > print("Pi is roughly %f" % (4.0 * count / n)) > {code} > For pyspark shell, the command is: > > {code:java} > $SPARK_HOME/bin/pyspark --master \ > k8s://https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_SERVICE_PORT_HTTPS \ > --deploy-mode client \ > --conf spark.executor.instances=5 \ > --conf spark.kubernetes.container.image=azureq/pantheon:pyspark-2.4 \ > --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ > --conf spark.driver.host=spark-client-mode-headless \ > --conf spark.kubernetes.pyspark.pythonVersion=3 \ > --conf spark.driver.pod.name=spark-client-mode-headless{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org