[ https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884791#comment-16884791 ]
Stavros Kontopoulos edited comment on SPARK-27927 at 7/15/19 12:38 AM: ----------------------------------------------------------------------- I was able to reproduce it easily with 2.4.3 and this similar code: from __future__ import print_function import sys from random import random from operator import add from pyspark.sql import SparkSession if __name__ == "__main__":( """ Usage: pi [partitions] """ spark = SparkSession\ .builder\ .appName("PythonPi")\ .getOrCreate() I used this tool [https://github.com/jglick/jkillthread] to kill eventloop and then the other okhttp thread: 19/07/15 00:12:06 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint Killing "OkHttp [https://kubernetes.default.svc/]..." Did not find "OkHttp [https://kubernetes.default.svc/]..." Killing "dag-scheduler-event-loop" Killing "OkHttp WebSocket [https://kubernetes.default.svc/]..." Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]..." java.lang.IllegalMonitorStateException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.signal(AbstractQueuedSynchronizer.java:1939) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1103) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Killing "OkHttp WebSocket [https://kubernetes.default.svc/]..." Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]..." java.lang.IllegalMonitorStateException Unfortunately I cant the kill the latter as another one is created. Anyway that means that this is just another case of https://issues.apache.org/jira/browse/SPARK-27812 . spark.stop() obviously stops the k8s client and everything finishes as expected. was (Author: skonto): I was able to reproduce it easily with 2.4.3 and this similar code: from __future__ import print_function import sys from random import random from operator import add from pyspark.sql import SparkSession if __name__ == "__main__":( """ Usage: pi [partitions] """ spark = SparkSession\ .builder\ .appName("PythonPi")\ .getOrCreate() I used this tool [https://github.com/jglick/jkillthread] to kill eventloop and then the other okhttp thread: 19/07/15 00:12:06 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint Killing "OkHttp [https://kubernetes.default.svc/]..." Did not find "OkHttp [https://kubernetes.default.svc/]..." Killing "dag-scheduler-event-loop" Killing "OkHttp WebSocket [https://kubernetes.default.svc/]..." Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]..." java.lang.IllegalMonitorStateException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.signal(AbstractQueuedSynchronizer.java:1939) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1103) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Killing "OkHttp WebSocket [https://kubernetes.default.svc/]..." Exception in thread "OkHttp WebSocket [https://kubernetes.default.svc/]..." java.lang.IllegalMonitorStateException Unfortunately I cant the kill the latter as another one is created. Anyway that means that this is just another case of https://issues.apache.org/jira/browse/SPARK-27812 . spark.stop() obviously stops the k8s client and everything finishes as expected. > driver pod hangs with pyspark 2.4.3 and master on kubenetes > ----------------------------------------------------------- > > Key: SPARK-27927 > URL: https://issues.apache.org/jira/browse/SPARK-27927 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark > Affects Versions: 3.0.0, 2.4.3 > Environment: k8s 1.11.9 > spark 2.4.3 and master branch. > Reporter: Edwin Biemond > Priority: Major > Attachments: driver_threads.log, executor_threads.log > > > When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs > and never calls the shutdown hook. > {code:java} > #!/usr/bin/env python > from __future__ import print_function > import os > import os.path > import sys > # Are we really in Spark? > from pyspark.sql import SparkSession > spark = SparkSession.builder.appName('hello_world').getOrCreate() > print('Our Spark version is {}'.format(spark.version)) > print('Spark context information: {} parallelism={} python version={}'.format( > str(spark.sparkContext), > spark.sparkContext.defaultParallelism, > spark.sparkContext.pythonVer > )) > {code} > When we run this on kubernetes the driver and executer are just hanging. We > see the output of this python script. > {noformat} > bash-4.2# cat stdout.log > Our Spark version is 2.4.3 > Spark context information: <SparkContext > master=k8s://https://kubernetes.default.svc:443 appName=hello_world> > parallelism=2 python version=3.6{noformat} > What works > * a simple python with a print works fine on 2.4.3 and 3.0.0 > * same setup on 2.4.0 > * 2.4.3 spark-submit with the above pyspark > > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org