[jira] [Commented] (SPARK-34349) No python3 in docker images

Hyukjin Kwon (Jira) Sat, 06 Feb 2021 21:45:04 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-34349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280382#comment-17280382
 ]


Hyukjin Kwon commented on SPARK-34349:
--------------------------------------

I think it's fined in the upstream. It would be great if you can have a chance 
to test Spark 3.1.1 RC and see if it's fixed.

> No python3 in docker images 
> ----------------------------
>
>                 Key: SPARK-34349
>                 URL: https://issues.apache.org/jira/browse/SPARK-34349
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.0.1
>            Reporter: Tim Hughes
>            Priority: Critical
>
> The spark-py container image doesn't receive the instruction to use python3 
> and defaults to python 2.7
>  
> The worker container was build using the following commands
> {code:java}
> mkdir ./tmp
> wget -qO- 
> https://www.mirrorservice.org/sites/ftp.apache.org/spark/spark-3.0.1/spark-3.0.1-bin-hadoop3.2.tgz
>  | tar -C ./tmp/ -xzf -
> cd ../spark-3.0.1-bin-hadoop3.2/
> ./bin/docker-image-tool.sh -r docker.io/timhughes -t 
> spark-3.0.1-bin-hadoop3.2 -p 
> kubernetes/dockerfiles/spark/bindings/python/Dockerfile build
> docker push docker.io/timhughes/spark-py:spark-3.0.1-bin-hadoop3.2{code}
>  
> This is the code I am using to initialize the workers
>  
> {code:java}
> import os
> from pyspark import SparkContext, SparkConf
> from pyspark.sql import SparkSession# Create Spark config for our Kubernetes 
> based cluster manager
> sparkConf = SparkConf()
> sparkConf.setMaster("k8s://https://kubernetes.default.svc.cluster.local:443";)
> sparkConf.setAppName("spark")
> sparkConf.set("spark.kubernetes.container.image", 
> "docker.io/timhughes/spark-py:spark-3.0.1-bin-hadoop3.2")
> sparkConf.set("spark.kubernetes.namespace", "spark")
> sparkConf.set("spark.executor.instances", "2")
> sparkConf.set("spark.executor.cores", "1")
> sparkConf.set("spark.driver.memory", "1024m")
> sparkConf.set("spark.executor.memory", "1024m")
> sparkConf.set("spark.kubernetes.pyspark.pythonVersion", "3")
> sparkConf.set("spark.kubernetes.authenticate.driver.serviceAccountName", 
> "spark")
> sparkConf.set("spark.kubernetes.authenticate.serviceAccountName", "spark")
> sparkConf.set("spark.driver.port", "29413")
> sparkConf.set("spark.driver.host", 
> "my-notebook-deployment.spark.svc.cluster.local")
> # Initialize our Spark cluster, this will actually
> # generate the worker nodes.
> spark = SparkSession.builder.config(conf=sparkConf).getOrCreate()
> sc = spark.sparkContext
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34349) No python3 in docker images

Reply via email to