Hi Mich I think you need to check your code. If code does not use PySpark API effectively you may get this. I.e. if you use pure Python/pandas api rather than Pyspark i.e. transform->transform->action. e.g df.select(..).withColumn(...)...count()
Hope this helps to put you on right direction. Cheers Khalid On Mon, 9 Aug 2021, 20:20 Mich Talebzadeh, <mich.talebza...@gmail.com> wrote: > Hi, > > I have a basic question to ask. > > I am running a Google k8s cluster (AKA GKE) with three nodes each having > configuration below > > e2-standard-2 (2 vCPUs, 8 GB memory) > > > spark-submit is launched from another node (actually a data proc single > node that I have just upgraded to e2-custom (4 vCPUs, 8 GB mem). We call > this the launch node > > OK I know that the cluster is not much but Google was complaining about > the launch node hitting 100% cpus. So I added two more cpus to it. > > It appears that despite using k8s as the computational cluster, the burden > falls upon the launch node! > > The cpu utilisation for launch node shown below > > [image: image.png] > The dip is when 2 more cpus were added to it so it had to reboot. so > around %70 usage > > The combined cpu usage for GKE nodes is shown below: > > [image: image.png] > > Never goes above 20%! > > I can see that the drive and executors as below: > > k get pods -n spark > NAME READY STATUS RESTARTS > AGE > pytest-c958c97b2c52b6ed-driver 1/1 Running 0 > 69s > randomdatabigquery-e68a8a7b2c52f468-exec-1 1/1 Running 0 > 51s > randomdatabigquery-e68a8a7b2c52f468-exec-2 1/1 Running 0 > 51s > randomdatabigquery-e68a8a7b2c52f468-exec-3 0/1 Pending 0 > 51s > > It is a PySpark 3.1.1 image using java 8 and pushing random data generated > into Google BigQuery data warehouse. The last executor (exec-3) seems to be > just pending. The spark-submit is as below: > > spark-submit --verbose \ > --properties-file ${property_file} \ > --master k8s://https://$KUBERNETES_MASTER_IP:443 \ > --deploy-mode cluster \ > --name pytest \ > --conf > spark.yarn.appMasterEnv.PYSPARK_PYTHON=./pyspark_venv/bin/python \ > --py-files $CODE_DIRECTORY/DSBQ.zip \ > --conf spark.kubernetes.namespace=$NAMESPACE \ > --conf spark.executor.memory=5000m \ > --conf spark.network.timeout=300 \ > --conf spark.executor.instances=3 \ > --conf spark.kubernetes.driver.limit.cores=1 \ > --conf spark.driver.cores=1 \ > --conf spark.executor.cores=1 \ > --conf spark.executor.memory=2000m \ > --conf spark.kubernetes.driver.docker.image=${IMAGEGCP} \ > --conf spark.kubernetes.executor.docker.image=${IMAGEGCP} \ > --conf spark.kubernetes.container.image=${IMAGEGCP} \ > --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark-bq \ > --conf > spark.driver.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \ > --conf > spark.executor.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" > \ > --conf spark.sql.execution.arrow.pyspark.enabled="true" \ > $CODE_DIRECTORY/${APPLICATION} > > Aren't the driver and executors running on K8s cluster? So why is the > launch node heavily used but k8s cluster is underutilized? > > Thanks > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > >