Hello Devs, I am investigating how matrix vector multiplication can scale for an IndexedRowMatrix in mllib.linalg.distributed.
Currently, I am broadcasting the vector to be multiplied on the right. The IndexedRowMatrix is stored across a cluster with up to 16 nodes, each with >200 GB of memory. The spark driver is on an identical node, having more than 200 Gb of memory. In scaling n, the size of the vector to be broadcast, I find that the maximum size of n that I can use is 2^26. For 2^27, the broadcast will fail. The array being broadcast is of type Array[Double], so the contents have size 2^30 bytes, which is approximately 1 (metric) GB. I have read in PR [SPARK-3721] [PySpark] "broadcast objects larger than 2G" that this should be supported (I assume this works for scala, as well?). However, when I increase n to 2^27 or above, the program invariably crashes at the broadcast. The problem stems from the size of the result block to be sent in BlockInfo.scala; the size is reportedly negative. An example error log is shown below. If anyone has more experience or knowledge of why this broadcast is failing, I'd appreciate the input. -- Thanks, Mike 55584:INFO:MemoryStore:ensureFreeSpace(-2147480008) called with curMem=0, maxMem=92610625536: 55584:INFO:MemoryStore:Block broadcast-2 stored as values in memory (estimated size -2147480008.0 B, free 88.3 GB): Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: sizeInBytes was negative: -2147480008 at scala.Predef$.require(Predef.scala:233) at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:815) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638) at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:996) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:99) at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:85) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1297) at org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix.multiply(IndexedRowMatrix.scala:184) at himrod.linalg.KrylovTests$.main(KrylovTests.scala:172) at himrod.linalg.KrylovTests.main(KrylovTests.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org