[ 
https://issues.apache.org/jira/browse/SPARK-26776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755602#comment-16755602
 ] 

Apache Spark commented on SPARK-26776:
--------------------------------------

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/23690

> Reduce Py4J communication cost in PySpark's execution barrier check
> -------------------------------------------------------------------
>
>                 Key: SPARK-26776
>                 URL: https://issues.apache.org/jira/browse/SPARK-26776
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.4.0, 3.0.0
>            Reporter: Hyukjin Kwon
>            Priority: Minor
>
> I am investigating flaky tests. I realised that:
> {code}
>       File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/rdd.py", line 
> 2512, in __init__
>         self.is_barrier = prev._is_barrier() or isFromBarrier
>       File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/rdd.py", line 
> 2412, in _is_barrier
>         return self._jrdd.rdd().isBarrier()
>       File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py",
>  line 1286, in __call__
>         answer, self.gateway_client, self.target_id, self.name)
>       File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py",
>  line 342, in get_return_value
>         return OUTPUT_CONVERTER[type](answer[2:], gateway_client)
>       File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py",
>  line 2492, in <lambda>
>         lambda target_id, gateway_client: JavaObject(target_id, 
> gateway_client))
>       File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py",
>  line 1324, in __init__
>         ThreadSafeFinalizer.add_finalizer(key, value)
>       File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.10.8.1-src.zip/py4j/finalizer.py",
>  line 43, in add_finalizer
>         cls.finalizers[id] = weak_ref
>       File "/usr/lib64/pypy-2.5.1/lib-python/2.7/threading.py", line 216, in 
> __exit__
>         self.release()
>       File "/usr/lib64/pypy-2.5.1/lib-python/2.7/threading.py", line 208, in 
> release
>         self.__block.release()
>     error: release unlocked lock
> {code}
> I assume it might not be directly related with the test itself but I noticed 
> that it prev._is_barrier() attempts to access via Py4J.
> Accessing via Py4J is expensive and IMHO it makes it flaky.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to