[ https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884250#comment-16884250 ]
Stavros Kontopoulos edited comment on SPARK-27927 at 7/13/19 12:37 AM: ----------------------------------------------------------------------- It is on 2.4.0: [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47] Not sure if it is the k8s client in this case because if you check my thread dump [https://gist.github.com/skonto/74181e434a727901d4f3323461c1050b] in [https://github.com/apache/spark/pull/24796] (this is recent because I didnt report it earlier, this failing pi job was there for at least a year but didnt have time...) these k8s threads still exist but they were not the root cause in the case with the exception. In any case we need to spot the root cause because we dont know how we ended up in different results anyway. So my question is why that thread is blocked there and we should debug the execution sequence in both cases eg. add logging. If it was the K8s threads I would expect to see only these threads blocked but it is also the eventloop, my 0.02$. was (Author: skonto): It is on 2.4.0: [https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47] Not sure if it is the k8s client in this case because if you check my thread dump [https://gist.github.com/skonto/74181e434a727901d4f3323461c1050b] in [https://github.com/apache/spark/pull/24796] (this is recent because I didnt report it earlier, this failing pi job was there for at least a year but didnt have time...) these k8s threads still exist but they were not the root cause in the case with the exception. In any case we need to spot the root cause because we dont know how we ended up in different results anyway. So my question is why that thread is blocked there and we should debug the execution sequence in both cases eg. add logging. If it was the K8s threads I would expect to see only these threads blocked but it is also the eventloop. > driver pod hangs with pyspark 2.4.3 and master on kubenetes > ----------------------------------------------------------- > > Key: SPARK-27927 > URL: https://issues.apache.org/jira/browse/SPARK-27927 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark > Affects Versions: 3.0.0, 2.4.3 > Environment: k8s 1.11.9 > spark 2.4.3 and master branch. > Reporter: Edwin Biemond > Priority: Major > Attachments: driver_threads.log, executor_threads.log > > > When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs > and never calls the shutdown hook. > {code:java} > #!/usr/bin/env python > from __future__ import print_function > import os > import os.path > import sys > # Are we really in Spark? > from pyspark.sql import SparkSession > spark = SparkSession.builder.appName('hello_world').getOrCreate() > print('Our Spark version is {}'.format(spark.version)) > print('Spark context information: {} parallelism={} python version={}'.format( > str(spark.sparkContext), > spark.sparkContext.defaultParallelism, > spark.sparkContext.pythonVer > )) > {code} > When we run this on kubernetes the driver and executer are just hanging. We > see the output of this python script. > {noformat} > bash-4.2# cat stdout.log > Our Spark version is 2.4.3 > Spark context information: <SparkContext > master=k8s://https://kubernetes.default.svc:443 appName=hello_world> > parallelism=2 python version=3.6{noformat} > What works > * a simple python with a print works fine on 2.4.3 and 3.0.0 > * same setup on 2.4.0 > * 2.4.3 spark-submit with the above pyspark > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org