[jira] [Resolved] (SPARK-27666) Do not release lock while TaskContext already completed

Wenchen Fan (JIRA) Mon, 17 Jun 2019 19:20:46 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-27666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wenchen Fan resolved SPARK-27666.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0

Issue resolved by pull request 24699
[https://github.com/apache/spark/pull/24699]

> Do not release lock while TaskContext already completed
> -------------------------------------------------------
>
>                 Key: SPARK-27666
>                 URL: https://issues.apache.org/jira/browse/SPARK-27666
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Wenchen Fan
>            Priority: Major
>             Fix For: 3.0.0
>
>
> {code:java}
> Exception in thread "Thread-14" java.lang.AssertionError: assertion failed: 
> Block rdd_0_0 is not locked for reading
>       at scala.Predef$.assert(Predef.scala:223)
>       at 
> org.apache.spark.storage.BlockInfoManager.unlock(BlockInfoManager.scala:299)
>       at 
> org.apache.spark.storage.BlockManager.releaseLock(BlockManager.scala:1000)
>       at 
> org.apache.spark.storage.BlockManager.$anonfun$getLocalValues$5(BlockManager.scala:746)
>       at 
> org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:47)
>       at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:36)
>       at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>       at org.apache.spark.rdd.RDDSuite.$anonfun$new$265(RDDSuite.scala:1185)
>       at java.lang.Thread.run(Thread.java:748)
> {code}
> We're facing an issue reported by SPARK-18406 and SPARK-25139. And 
> [https://github.com/apache/spark/pull/24542] bypassed the issue by capturing 
> the assertion error to avoid failing the executor. However, when not using 
> pyspark, issue still exists when user implements a custom 
> RDD(https://issues.apache.org/jira/browse/SPARK-18406?focusedCommentId=15969384&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15969384)
>  or task(see demo below), which spawn a separate thread to consume iterator 
> from a cached parent RDD.
> {code:java}
> val rdd0 = sc.parallelize(Range(0, 10), 1).cache()
>     rdd0.collect()
>     rdd0.mapPartitions { iter =>
>       val t = new Thread(new Runnable {
>         override def run(): Unit = {
>           while(iter.hasNext) {
>             println(iter.next())
>             Thread.sleep(1000)
>           }
>         }
>       })
>       t.setDaemon(false)
>       t.start()
>       Iterator(0)
>     }.collect()
> {code}
> we could easily to reproduce the issue using the demo above.
> If we could prevent the separate thread from releasing lock on block when 
> TaskContext has already completed, 
> then, we won't hit this issue again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-27666) Do not release lock while TaskContext already completed

Reply via email to