[ https://issues.apache.org/jira/browse/SPARK-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957446#comment-13957446 ]
Min Zhou commented on SPARK-1391: --------------------------------- [~sowen] Yes, there is possibility. It's not a line number, it's instead should have an index or Otherewise it should be a user defined exception or native exception. I greped the fastutil source code, it won't throw AIOOBEs with empty message. And the line number is not corresponding to the v6.4.4 see line 96 of FastByteArrayOutputStream: http://grepcode.com/file/repo1.maven.org/maven2/it.unimi.dsi/fastutil/6.4.4/it/unimi/dsi/fastutil/io/FastByteArrayOutputStream.java Here is the possibility where throws AIOOBEs. The position can be negative due to line {noformat} public void More ...write( final byte[] b, final int off, final int len ) throws IOException { ... if ( position + len > length ) length = position += len; ... } {noformat} Here is the simulation with fastutils under the version of 6.4.4 {noformat} import it.unimi.dsi.fastutil.io.FastByteArrayOutputStream; public class ArrayOutofIndex { public static void main(String[] args) throws Exception { FastByteArrayOutputStream outputStream = new FastByteArrayOutputStream(4096); outputStream.position(-1); outputStream.write('a'); outputStream.close(); } } Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1 at it.unimi.dsi.fastutil.io.FastByteArrayOutputStream.write(FastByteArrayOutputStream.java:92) at mzhou.shuffle.perf.ArrayOutofIndex.main(ArrayOutofIndex.java:29) {noformat} {noformat} import it.unimi.dsi.fastutil.io.FastByteArrayOutputStream; public class ArrayOutofIndex { public static void main(String[] args) throws Exception { FastByteArrayOutputStream outputStream = new FastByteArrayOutputStream(4096); outputStream.position(-1); outputStream.write(new byte[1024], 0, 1024); outputStream.close(); } } Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at it.unimi.dsi.fastutil.io.FastByteArrayOutputStream.write(FastByteArrayOutputStream.java:98) at mzhou.shuffle.perf.ArrayOutofIndex.main(ArrayOutofIndex.java:29) {noformat} The line number and stack info is not the same as that be reported. > BlockManager cannot transfer blocks larger than 2G in size > ---------------------------------------------------------- > > Key: SPARK-1391 > URL: https://issues.apache.org/jira/browse/SPARK-1391 > Project: Spark > Issue Type: Bug > Components: Block Manager, Shuffle > Affects Versions: 1.0.0 > Reporter: Shivaram Venkataraman > > If a task tries to remotely access a cached RDD block, I get an exception > when the block size is > 2G. The exception is pasted below. > Memory capacities are huge these days (> 60G), and many workflows depend on > having large blocks in memory, so it would be good to fix this bug. > I don't know if the same thing happens on shuffles if one transfer (from > mapper to reducer) is > 2G. > 14/04/02 02:33:10 ERROR storage.BlockManagerWorker: Exception handling buffer > message > java.lang.ArrayIndexOutOfBoundsException > at > it.unimi.dsi.fastutil.io.FastByteArrayOutputStream.write(FastByteArrayOutputStream.java:96) > at > it.unimi.dsi.fastutil.io.FastBufferedOutputStream.dumpBuffer(FastBufferedOutputStream.java:134) > at > it.unimi.dsi.fastutil.io.FastBufferedOutputStream.write(FastBufferedOutputStream.java:164) > at > java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876) > at > java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1188) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:38) > at > org.apache.spark.serializer.SerializationStream$class.writeAll(Serializer.scala:93) > at > org.apache.spark.serializer.JavaSerializationStream.writeAll(JavaSerializer.scala:26) > at > org.apache.spark.storage.BlockManager.dataSerializeStream(BlockManager.scala:913) > at > org.apache.spark.storage.BlockManager.dataSerialize(BlockManager.scala:922) > at > org.apache.spark.storage.MemoryStore.getBytes(MemoryStore.scala:102) > at > org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:348) > at > org.apache.spark.storage.BlockManager.getLocalBytes(BlockManager.scala:323) > at > org.apache.spark.storage.BlockManagerWorker.getBlock(BlockManagerWorker.scala:90) > at > org.apache.spark.storage.BlockManagerWorker.processBlockMessage(BlockManagerWorker.scala:69) > at > org.apache.spark.storage.BlockManagerWorker$$anonfun$2.apply(BlockManagerWorker.scala:44) > at > org.apache.spark.storage.BlockManagerWorker$$anonfun$2.apply(BlockManagerWorker.scala:44) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at > org.apache.spark.storage.BlockMessageArray.foreach(BlockMessageArray.scala:28) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at > org.apache.spark.storage.BlockMessageArray.map(BlockMessageArray.scala:28) > at > org.apache.spark.storage.BlockManagerWorker.onBlockMessageReceive(BlockManagerWorker.scala:44) > at > org.apache.spark.storage.BlockManagerWorker$$anonfun$1.apply(BlockManagerWorker.scala:34) > at > org.apache.spark.storage.BlockManagerWorker$$anonfun$1.apply(BlockManagerWorker.scala:34) > at > org.apache.spark.network.ConnectionManager.org$apache$spark$network$ConnectionManager$$handleMessage(ConnectionManager.scala:661) > at > org.apache.spark.network.ConnectionManager$$anon$9.run(ConnectionManager.scala:503) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) -- This message was sent by Atlassian JIRA (v6.2#6252)