[ https://issues.apache.org/jira/browse/SPARK-35009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Attila Zsolt Piros resolved SPARK-35009. ---------------------------------------- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32169 [https://github.com/apache/spark/pull/32169] > Avoid creating multiple Monitor threads for reused python workers for the > same TaskContext > ------------------------------------------------------------------------------------------ > > Key: SPARK-35009 > URL: https://issues.apache.org/jira/browse/SPARK-35009 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0 > Reporter: Attila Zsolt Piros > Assignee: Attila Zsolt Piros > Priority: Major > Fix For: 3.2.0 > > > Currently this code will stop because of the high number of created threads: > {noformat} > import pyspark > conf=pyspark.SparkConf().setMaster("local[64]").setAppName("Test1") > sc=pyspark.SparkContext.getOrCreate(conf) > rows=70000 > data=list(range(rows)) > rdd=sc.parallelize(data,rows) > assert rdd.getNumPartitions()==rows > rdd0=rdd.filter(lambda x:False) > assert rdd0.getNumPartitions()==rows > rdd00=rdd0.coalesce(1) > data=rdd00.collect() > assert data==[] > {noformat} > The error is: > {noformat} > 1/04/08 12:12:26 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 21/04/08 12:12:29 WARN TaskSetManager: Stage 0 contains a task of very large > size (4732 KiB). The maximum recommended task size is 1000 KiB. > [Stage 0:> (0 + 1) / > 1][423.190s][warning][os,thread] Attempt to protect stack guard pages failed > (0x00007f43d23ff000-0x00007f43d2403000). > [423.190s][warning][os,thread] Attempt to deallocate stack guard pages failed. > OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f43d300b000, > 16384, 0) failed; error='Not enough space' (errno=12) > [423.231s][warning][os,thread] Failed to start thread - pthread_create failed > (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached. > # > # There is insufficient memory for the Java Runtime Environment to continue. > # Native memory allocation (mmap) failed to map 16384 bytes for committing > reserved memory. > # An error report file with more information is saved as: > # /home/ubuntu/PycharmProjects/<projekt-dir>/tests/hs_err_pid17755.log > [thread 17966 also had an error] > OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f4b7bd81000, > 262144, 0) failed; error='Not enough space' (errno=12) > ERROR:root:Exception while sending command. > Traceback (most recent call last): > File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line > 1207, in send_command > raise Py4JNetworkError("Answer from Java side is empty") > py4j.protocol.Py4JNetworkError: Answer from Java side is empty > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line > 1033, in send_command > response = connection.send_command(command) > File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line > 1211, in send_command > raise Py4JNetworkError( > py4j.protocol.Py4JNetworkError: Error while receiving > ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java > server (127.0.0.1:42439) > Traceback (most recent call last): > File "/opt/spark/python/pyspark/rdd.py", line 889, in collect > sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) > File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line > 1304, in __call__ > return_value = get_return_value( > File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line > 334, in get_return_value > raise Py4JError( > py4j.protocol.Py4JError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.collectAndServe > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line > 977, in _get_connection > connection = self.deque.pop() > IndexError: pop from an empty deque > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line > 1115, in start > self.socket.connect((self.address, self.port)) > ConnectionRefusedError: [Errno 111] Connection refused > Traceback (most recent call last): > File "/opt/spark/python/pyspark/rdd.py", line 889, in collect > sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) > File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line > 1304, in __call__ > return_value = get_return_value( > File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line > 334, in get_return_value > raise Py4JError( > py4j.protocol.Py4JError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.collectAndServe > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line > 977, in _get_connection > connection = self.deque.pop() > IndexError: pop from an empty deque > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line > 1115, in start > self.socket.connect((self.address, self.port)) > ConnectionRefusedError: [Errno 111] Connection refused > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "<input>", line 3, in <module> > File > "/opt/pycharm-2020.2.3/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", > line 197, in runfile > pydev_imports.execfile(filename, global_vars, local_vars) # execute the > script > File > "/opt/pycharm-2020.2.3/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", > line 18, in execfile > exec(compile(contents+"\n", file, 'exec'), glob, loc) > File > "/home/ubuntu/PycharmProjects/SPO_as_a_Service/tests/test_modeling_paf.py", > line 992, in <module> > test_70000() > File > "/home/ubuntu/PycharmProjects/SPO_as_a_Service/tests/test_modeling_paf.py", > line 974, in test_70000 > data=rdd00.collect() > File "/opt/spark/python/pyspark/rdd.py", line 889, in collect > sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) > File "/opt/spark/python/pyspark/traceback_utils.py", line 78, in __exit__ > self._context._jsc.setCallSite(None) > File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line > 1303, in __call__ > answer = self.gateway_client.send_command(command) > File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line > 1031, in send_command > connection = self._get_connection() > File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line > 979, in _get_connection > connection = self._create_connection() > File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line > 985, in _create_connection > connection.start() > File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line > 1127, in start > raise Py4JNetworkError(msg, e) > py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to > the Java server (127.0.0.1:42439) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org