Hi Mike, are you sure there the size isn't off 2x somehow? I just tried to reproduce with a simple test in BlockManagerSuite:
test("large block") { store = makeBlockManager(4e9.toLong) val arr = new Array[Double](1 << 28) println(arr.size) val blockId = BlockId("rdd_3_10") val result = store.putIterator(blockId, Iterator(arr), StorageLevel.MEMORY_AND_DISK) result.foreach{println} } it fails at 1 << 28 with nearly the same message, but its fine for (1 << 28) - 1 with a reported block size of 2147483680. Not exactly the same as what you did, but I expect it to be close enough to exhibit the same error. On Tue, Jul 28, 2015 at 12:37 PM, Mike Hynes <91m...@gmail.com> wrote: > > Hello Devs, > > I am investigating how matrix vector multiplication can scale for an > IndexedRowMatrix in mllib.linalg.distributed. > > Currently, I am broadcasting the vector to be multiplied on the right. > The IndexedRowMatrix is stored across a cluster with up to 16 nodes, > each with >200 GB of memory. The spark driver is on an identical node, > having more than 200 Gb of memory. > > In scaling n, the size of the vector to be broadcast, I find that the > maximum size of n that I can use is 2^26. For 2^27, the broadcast will > fail. The array being broadcast is of type Array[Double], so the > contents have size 2^30 bytes, which is approximately 1 (metric) GB. > > I have read in PR [SPARK-3721] [PySpark] "broadcast objects larger > than 2G" that this should be supported (I assume this works for scala, > as well?). However, when I increase n to 2^27 or above, the program > invariably crashes at the broadcast. > > The problem stems from the size of the result block to be sent in > BlockInfo.scala; the size is reportedly negative. An example error log > is shown below. > > If anyone has more experience or knowledge of why this broadcast is > failing, I'd appreciate the input. > -- > Thanks, > Mike > > 55584:INFO:MemoryStore:ensureFreeSpace(-2147480008) called with > curMem=0, maxMem=92610625536: > 55584:INFO:MemoryStore:Block broadcast-2 stored as values in memory > (estimated size -2147480008.0 B, free 88.3 GB): > Exception in thread "main" java.lang.IllegalArgumentException: > requirement failed: sizeInBytes was negative: -2147480008 > at scala.Predef$.require(Predef.scala:233) > at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55) > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:815) > at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638) > at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:996) > at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:99) > at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:85) > at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) > at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63) > at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1297) > at org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix.multiply(IndexedRowMatrix.scala:184) > at himrod.linalg.KrylovTests$.main(KrylovTests.scala:172) > at himrod.linalg.KrylovTests.main(KrylovTests.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:666) > at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org >