[ https://issues.apache.org/jira/browse/SPARK-6839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14490576#comment-14490576 ]
Ilya Ganelin commented on SPARK-6839: ------------------------------------- The obvious solution won't work. Adding a ```TaskContext``` to ```dataSerialize()``` won't work because it's called from within both ```MemoryStore``` and ```TachyonStore``` which are instantiated within the ```BlockManager``` constructor. The ```TaskContext``` also can't be created within the constructor for ```BlockManager``` since that's created within the ```SparkEnv``` constructor which has no tasks associated with it. The only workable solution that I can see is to assign a ```TaskContext``` to the ```BlockManager``` at run-time but that sounds very sketchy to me since the block manager is a singleton and we may have multiple tasks going at once. Any thoughts on this conundrum? > BlockManager.dataDeserialize leaks resources on user exceptions > --------------------------------------------------------------- > > Key: SPARK-6839 > URL: https://issues.apache.org/jira/browse/SPARK-6839 > Project: Spark > Issue Type: Bug > Components: Spark Core > Reporter: Imran Rashid > > From a discussion with [~vanzin] on {{ByteBufferInputStream}}, we realized > that > [{{BlockManager.dataDeserialize}}|https://github.com/apache/spark/blob/b5c51c8df480f1a82a82e4d597d8eea631bffb4e/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1202] > doesn't guarantee the underlying InputStream is properly closed. In > particular, {{BlockManager.dispose(byteBuffer)}} will not get called any time > there is an exception in user code. > The problem is that right now, we convert the input streams to iterators, and > only close the input stream if the end of the iterator is reached. But, we > might never reach the end of the iterator -- the obvious case is if there is > a bug in the user code, so tasks fail part of the way through the iterator. > I think the solution is to give {{BlockManager.dataDeserialize}} a > {{TaskContext}} so it can call {{context.addTaskCompletionListener}} to do > the cleanup (as is done in {{ShuffleBlockFetcherIterator}}). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org