[ https://issues.apache.org/jira/browse/SPARK-22083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Imran Rashid updated SPARK-22083: --------------------------------- Summary: When dropping multiple blocks to disk, Spark should release all locks on a failure (was: When dropping multiple locks to disk, Spark should release all locks on a failure) > When dropping multiple blocks to disk, Spark should release all locks on a > failure > ---------------------------------------------------------------------------------- > > Key: SPARK-22083 > URL: https://issues.apache.org/jira/browse/SPARK-22083 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core > Affects Versions: 2.1.1, 2.2.0 > Reporter: Imran Rashid > > {{MemoryStore.evictBlocksToFreeSpace}} first [acquires writer locks on all > the blocks it intends to evict | > https://github.com/apache/spark/blob/55d5fa79db883e4d93a9c102a94713c9d2d1fb55/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala#L520]. > However, if there is an exception while dropping blocks, there is no > {{finally}} block to release all the locks. > If there is only block being dropped, this isn't a problem (probably). > Usually the call stack goes from {{MemoryStore.evictBlocksToFreeSpace --> > dropBlocks --> BlockManager.dropFromMemory --> DiskStore.put}}. And > {{DiskStore.put}} does do a [{{removeBlock()}} in a {{finally}} > block|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/DiskStore.scala#L83], > which cleans up the locks. > I ran into this from the serialization issue in SPARK-21928. In that, a > netty thread ends up trying to evict some blocks from memory to disk, and > fails. When there is only block that needs to be evicted, and the error > occurs, there isn't any real problem; I assume that netty thread is dead, but > the executor threads seem fine. However, in the cases where two blocks get > dropped, one task gets completely stuck. Unfortunately I don't have a stack > trace from the stuck executor, but I assume it just waits forever on this > lock that never gets released. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org