Kousuke Saruta created SPARK-2677: ------------------------------------- Summary: BasicBlockFetchIterator#next can be wait forever Key: SPARK-2677 URL: https://issues.apache.org/jira/browse/SPARK-2677 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: Kousuke Saruta Priority: Critical
In BasicBlockFetchIterator#next, it waits fetch result on result.take. {code} override def next(): (BlockId, Option[Iterator[Any]]) = { resultsGotten += 1 val startFetchWait = System.currentTimeMillis() val result = results.take() val stopFetchWait = System.currentTimeMillis() _fetchWaitTime += (stopFetchWait - startFetchWait) if (! result.failed) bytesInFlight -= result.size while (!fetchRequests.isEmpty && (bytesInFlight == 0 || bytesInFlight + fetchRequests.front.size <= maxBytesInFlight)) { sendRequest(fetchRequests.dequeue()) } (result.blockId, if (result.failed) None else Some(result.deserialize())) } {code} But, results is implemented as LinkedBlockingQueue so if remote executor hang up, fetching Executor waits forever. -- This message was sent by Atlassian JIRA (v6.2#6252)