I think you've hit the nail on the head. Since the serialization
ultimately creates a byte array, and arrays can have at most ~2
billion elements in the JVM, the broadcast can be at most ~2GB.

At that scale, you might consider whether you really have to broadcast
these values, or want to handle them as RDDs and join and so on.

Or consider whether you can break it up into several broadcasts?


On Fri, Feb 13, 2015 at 6:24 PM, soila <skavu...@gmail.com> wrote:
> I am trying to broadcast a large 5GB variable using Spark 1.2.0. I get the
> following exception when the size of the broadcast variable exceeds 2GB. Any
> ideas on how I can resolve this issue?
>
> java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
>         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:829)
>         at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:123)
>         at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:132)
>         at
> org.apache.spark.storage.DiskStore.putIterator(DiskStore.scala:99)
>         at
> org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:147)
>         at
> org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:114)
>         at
> org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:787)
>         at
> org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
>         at
> org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:992)
>         at
> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:98)
>         at
> org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:84)
>         at
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
>         at
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29)
>         at
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
>         at org.apache.spark.SparkContext.broadcast(SparkContext.scala:945)
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Size-exceeds-Integer-MAX-VALUE-exception-when-broadcasting-large-variable-tp21648.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to