[ 
https://issues.apache.org/jira/browse/SPARK-22083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

holdenk updated SPARK-22083:
----------------------------
    Fix Version/s: 2.1.2

> When dropping multiple blocks to disk, Spark should release all locks on a 
> failure
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-22083
>                 URL: https://issues.apache.org/jira/browse/SPARK-22083
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager, Spark Core
>    Affects Versions: 2.1.1, 2.2.0
>            Reporter: Imran Rashid
>            Assignee: Imran Rashid
>             Fix For: 2.1.2, 2.2.1, 2.3.0, 2.1.3
>
>
> {{MemoryStore.evictBlocksToFreeSpace}} first [acquires writer locks on all 
> the blocks it intends to evict | 
> https://github.com/apache/spark/blob/55d5fa79db883e4d93a9c102a94713c9d2d1fb55/core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala#L520].
>   However, if there is an exception while dropping blocks, there is no 
> {{finally}} block to release all the locks.
> If there is only one block being dropped, this isn't a problem (probably).  
> Usually the call stack goes from {{MemoryStore.evictBlocksToFreeSpace --> 
> dropBlocks --> BlockManager.dropFromMemory --> DiskStore.put}}.  And 
> {{DiskStore.put}} does do a [{{removeBlock()}} in a {{finally}} 
> block|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/DiskStore.scala#L83],
>  which cleans up the locks.
> I ran into this from the serialization issue in SPARK-21928.  In that, a 
> netty thread ends up trying to evict some blocks from memory to disk, and 
> fails.  When there is only one block that needs to be evicted, and the error 
> occurs, there isn't any real problem; I assume that netty thread is dead, but 
> the executor threads seem fine.  However, in the cases where two blocks get 
> dropped, one task gets completely stuck.  Unfortunately I don't have a stack 
> trace from the stuck executor, but I assume it just waits forever on this 
> lock that never gets released.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to