[ https://issues.apache.org/jira/browse/SPARK-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guoqiang Li updated SPARK-2677: ------------------------------- Fix Version/s: 1.1.0 > BasicBlockFetchIterator#next can wait forever > --------------------------------------------- > > Key: SPARK-2677 > URL: https://issues.apache.org/jira/browse/SPARK-2677 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.0.0 > Reporter: Kousuke Saruta > Priority: Critical > Fix For: 1.1.0 > > > In BasicBlockFetchIterator#next, it waits fetch result on result.take. > {code} > override def next(): (BlockId, Option[Iterator[Any]]) = { > resultsGotten += 1 > val startFetchWait = System.currentTimeMillis() > val result = results.take() > val stopFetchWait = System.currentTimeMillis() > _fetchWaitTime += (stopFetchWait - startFetchWait) > if (! result.failed) bytesInFlight -= result.size > while (!fetchRequests.isEmpty && > (bytesInFlight == 0 || bytesInFlight + fetchRequests.front.size <= > maxBytesInFlight)) { > sendRequest(fetchRequests.dequeue()) > } > (result.blockId, if (result.failed) None else > Some(result.deserialize())) > } > {code} > But, results is implemented as LinkedBlockingQueue so if remote executor hang > up, fetching Executor waits forever. -- This message was sent by Atlassian JIRA (v6.2#6252)