[ https://issues.apache.org/jira/browse/SPARK-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076033#comment-14076033 ]
Guoqiang Li commented on SPARK-2677: ------------------------------------ [~pwendell] , [~sarutak] How about the following solution? https://github.com/witgo/spark/compare/SPARK-2677 > BasicBlockFetchIterator#next can wait forever > --------------------------------------------- > > Key: SPARK-2677 > URL: https://issues.apache.org/jira/browse/SPARK-2677 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 0.9.2, 1.0.0, 1.0.1 > Reporter: Kousuke Saruta > Priority: Blocker > > In BasicBlockFetchIterator#next, it waits fetch result on result.take. > {code} > override def next(): (BlockId, Option[Iterator[Any]]) = { > resultsGotten += 1 > val startFetchWait = System.currentTimeMillis() > val result = results.take() > val stopFetchWait = System.currentTimeMillis() > _fetchWaitTime += (stopFetchWait - startFetchWait) > if (! result.failed) bytesInFlight -= result.size > while (!fetchRequests.isEmpty && > (bytesInFlight == 0 || bytesInFlight + fetchRequests.front.size <= > maxBytesInFlight)) { > sendRequest(fetchRequests.dequeue()) > } > (result.blockId, if (result.failed) None else > Some(result.deserialize())) > } > {code} > But, results is implemented as LinkedBlockingQueue so if remote executor hang > up, fetching Executor waits forever. -- This message was sent by Atlassian JIRA (v6.2#6252)