Hi, Probably the problem you met is related to this JIRA ticket (https://issues.apache.org/jira/browse/SPARK-3948). It's potentially a Kernel 2.6.32 bug which will make sort-based shuffle failed. I'm not sure your problem is the same as this one, would you mind checking your kernel version?
Thanks Jerry From: su...@certusnet.com.cn [mailto:su...@certusnet.com.cn] Sent: Monday, October 27, 2014 5:41 PM To: user Subject: Sort-based shuffle did not work as expected Hi, all We would expect to utilize sort-based shuffle in our spark application but had encounted unhandled problems. It seems that data file and index file are not in consistence state and we got wrong result sets when trying to use spark to bulk load data into hbase. There are many fetch failurs like the following: FetchFailed(BlockManagerId(0, work5.msa.certusnet, 44544, 0), shuffleId=0, mapId=42, reduceId=3) Refering to the executor log, we catch the following exception: 14/10/27 11:20:36 ERROR BlockFetcherIterator$BasicBlockFetcherIterator: Could not get block(s) from ConnectionManagerId(work4.msa.certusnet,53616) java.io.IOException: sendMessageReliably failed with ACK that signalled a remote error at org.apache.spark.network.ConnectionManager$$anonfun$14.apply(ConnectionManager.scala:869) at org.apache.spark.network.ConnectionManager$$anonfun$14.apply(ConnectionManager.scala:861) at org.apache.spark.network.ConnectionManager$MessageStatus.markDone(ConnectionManager.scala:66) at org.apache.spark.network.ConnectionManager.org$apache$spark$network$ConnectionManager$$handleMessage(ConnectionManager.scala:660) at org.apache.spark.network.ConnectionManager$$anon$10.run(ConnectionManager.scala:520) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 14/10/27 11:00:50 ERROR BlockManagerWorker: Exception handling buffer message java.io.IOException: Channel not open for writing - cannot extend file to required size at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:851) at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:104) at org.apache.spark.storage.BlockManager.getLocalBytes(BlockManager.scala:379) at org.apache.spark.storage.BlockManagerWorker.getBlock(BlockManagerWorker.scala:100) at org.apache.spark.storage.BlockManagerWorker.processBlockMessage(BlockManagerWorker.scala:79) at org.apache.spark.storage.BlockManagerWorker$$anonfun$2.apply(BlockManagerWorker.scala:48) at org.apache.spark.storage.BlockManagerWorker$$anonfun$2.apply(BlockManagerWorker.scala:48) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at org.apache.spark.storage.BlockMessageArray.foreach(BlockMessageArray.scala:28) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at org.apache.spark.storage.BlockMessageArray.map(BlockMessageArray.scala:28) at org.apache.spark.storage.BlockManagerWorker.onBlockMessageReceive(BlockManagerWorker.scala:48) Any suggestion? Thanks Sun ________________________________ ________________________________ CertusNet