[ https://issues.apache.org/jira/browse/SPARK-26776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755602#comment-16755602 ]
Apache Spark commented on SPARK-26776: -------------------------------------- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/23690 > Reduce Py4J communication cost in PySpark's execution barrier check > ------------------------------------------------------------------- > > Key: SPARK-26776 > URL: https://issues.apache.org/jira/browse/SPARK-26776 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.4.0, 3.0.0 > Reporter: Hyukjin Kwon > Priority: Minor > > I am investigating flaky tests. I realised that: > {code} > File > "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/rdd.py", line > 2512, in __init__ > self.is_barrier = prev._is_barrier() or isFromBarrier > File > "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/rdd.py", line > 2412, in _is_barrier > return self._jrdd.rdd().isBarrier() > File > "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", > line 1286, in __call__ > answer, self.gateway_client, self.target_id, self.name) > File > "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", > line 342, in get_return_value > return OUTPUT_CONVERTER[type](answer[2:], gateway_client) > File > "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", > line 2492, in <lambda> > lambda target_id, gateway_client: JavaObject(target_id, > gateway_client)) > File > "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", > line 1324, in __init__ > ThreadSafeFinalizer.add_finalizer(key, value) > File > "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/finalizer.py", > line 43, in add_finalizer > cls.finalizers[id] = weak_ref > File "/usr/lib64/pypy-2.5.1/lib-python/2.7/threading.py", line 216, in > __exit__ > self.release() > File "/usr/lib64/pypy-2.5.1/lib-python/2.7/threading.py", line 208, in > release > self.__block.release() > error: release unlocked lock > {code} > I assume it might not be directly related with the test itself but I noticed > that it prev._is_barrier() attempts to access via Py4J. > Accessing via Py4J is expensive and IMHO it makes it flaky. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org