[ https://issues.apache.org/jira/browse/SPARK-27666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
wuyi updated SPARK-27666: ------------------------- Description: We're facing an issue reported by SPARK-18406 and SPARK-25139. And [https://github.com/apache/spark/pull/24542] bypassed the issue by capturing the assertion error to avoid failing the executor. However, when not using pyspark, issue still exists when user implements a custom RDD(https://issues.apache.org/jira/browse/SPARK-18406?focusedCommentId=15969384&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15969384) or task(see demo below), which spawn a separate thread to consume iterator from a cached parent RDD. {code:java} val rdd0 = sc.parallelize(Range(0, 10), 1).cache() rdd0.collect() rdd0.mapPartitions { iter => val t = new Thread(new Runnable { override def run(): Unit = { while(iter.hasNext) { println(iter.next()) Thread.sleep(1000) } } }) t.setDaemon(false) t.start() Iterator(0) }.collect() {code} we could easily to reproduce the issue using the demo above. If we could prevent the separate thread from releasing lock on block when TaskContext has already completed, then, we won't hit this issue again. was: When a task finishes, we should stop the threads that are spawned by the Python runner immediately. https://github.com/apache/spark/pull/24542 fixed the problem that, if the Python runner thread releases the block lock at the end, and the block lock is already released when task finishes, we will hit assertion error. If we can stop the python runner thread when task finishes, we don't need https://github.com/apache/spark/pull/24542 any more. > Do not release lock while TaskContext already completed > ------------------------------------------------------- > > Key: SPARK-27666 > URL: https://issues.apache.org/jira/browse/SPARK-27666 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.0.0 > Reporter: Wenchen Fan > Priority: Major > > We're facing an issue reported by SPARK-18406 and SPARK-25139. And > [https://github.com/apache/spark/pull/24542] bypassed the issue by capturing > the assertion error to avoid failing the executor. However, when not using > pyspark, issue still exists when user implements a custom > RDD(https://issues.apache.org/jira/browse/SPARK-18406?focusedCommentId=15969384&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15969384) > or task(see demo below), which spawn a separate thread to consume iterator > from a cached parent RDD. > {code:java} > val rdd0 = sc.parallelize(Range(0, 10), 1).cache() > rdd0.collect() > rdd0.mapPartitions { iter => > val t = new Thread(new Runnable { > override def run(): Unit = { > while(iter.hasNext) { > println(iter.next()) > Thread.sleep(1000) > } > } > }) > t.setDaemon(false) > t.start() > Iterator(0) > }.collect() > {code} > we could easily to reproduce the issue using the demo above. > If we could prevent the separate thread from releasing lock on block when > TaskContext has already completed, then, > we won't hit this issue again. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org