Qi Shao created SPARK-26237: ------------------------------- Summary: [K8s] Unable to switch python version in executor when running pyspark shell. Key: SPARK-26237 URL: https://issues.apache.org/jira/browse/SPARK-26237 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.4.0 Environment: Spark 2.4.0
Google Kubernetes Engines Reporter: Qi Shao Error message: {code:java} Exception: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.{code} Neither {code:java} spark.kubernetes.pyspark.pythonVersion{code} nor {code:java} spark.executorEnv.PYSPARK_MAJOR_PYTHON_VERSION {code} works. This happens when I'm running a Notebook with pyspark+python3 and also in a pod which has pyspark+python3. For notebook, the code is: {code:java} ``` from _future_ import print_function import sys from random import random from operator import add from pyspark.sql import SparkSession spark = SparkSession.builder\ .master("k8s://https://kubernetes.default.svc")\ .appName("PySpark Testout")\ .config("spark.submit.deployMode","client")\ .config("spark.executor.instances", "2")\ .config("spark.kubernetes.container.image","azureq/pantheon:pyspark-2.4")\ .config("spark.driver.host","jupyter-notebook-headless")\ .config("spark.driver.pod.name","jupyter-notebook-headless")\ .config("spark.kubernetes.authenticate.driver.serviceAccountName","spark")\ .config("spark.kubernetes.pyspark.pythonVersion","3")\ .config("spark.executorEnv.PYSPARK_MAJOR_PYTHON_VERSION","3")\ .getOrCreate() n = 100000 def f(_): x = random() * 2 - 1 y = random() * 2 - 1 return 1 if x ** 2 + y ** 2 <= 1 else 0 count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add) print("Pi is roughly %f" % (4.0 * count / n)) {code} For pyspark shell, the command is: {code:java} $SPARK_HOME/bin/pyspark --master \ k8s://https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_SERVICE_PORT_HTTPS \ --deploy-mode client \ --conf spark.executor.instances=5 \ --conf spark.kubernetes.container.image=azureq/pantheon:pyspark-2.4 \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ --conf spark.driver.host=spark-client-mode-headless \ --conf spark.kubernetes.pyspark.pythonVersion=3 \ --conf spark.driver.pod.name=spark-client-mode-headless{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org