[ https://issues.apache.org/jira/browse/SPARK-5739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316412#comment-14316412 ]
Sean Owen commented on SPARK-5739: ---------------------------------- No, it should be able to operate on sparse vectors, but what you generated and loaded was fully dense. > Size exceeds Integer.MAX_VALUE in File Map > ------------------------------------------ > > Key: SPARK-5739 > URL: https://issues.apache.org/jira/browse/SPARK-5739 > Project: Spark > Issue Type: Bug > Affects Versions: 1.1.1 > Environment: Spark1.1.1 on a cluster with 12 node. Every node with > 128GB RAM, 24 Core. the data is just 40GB, and there is 48 parallel task on a > node. > Reporter: DjvuLee > > I just run the kmeans algorithm using a random generate data,but occurred > this problem after some iteration. I try several time, and this problem is > reproduced. > Because the data is random generate, so I guess is there a bug ? Or if random > data can lead to such a scenario that the size is bigger than > Integer.MAX_VALUE, can we check the size before using the file map? > 015-02-11 00:39:36,057 [sparkDriver-akka.actor.default-dispatcher-15] WARN > org.apache.spark.util.SizeEstimator - Failed to check whether > UseCompressedOops is set; assuming yes > [error] (run-main-0) java.lang.IllegalArgumentException: Size exceeds > Integer.MAX_VALUE > java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:850) > at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:105) > at org.apache.spark.storage.DiskStore.putIterator(DiskStore.scala:86) > at > org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:140) > at > org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:105) > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:747) > at > org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:598) > at > org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:869) > at > org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:79) > at > org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:68) > at > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:36) > at > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) > at > org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) > at org.apache.spark.SparkContext.broadcast(SparkContext.scala:809) > at > org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:270) > at org.apache.spark.mllib.clustering.KMeans.runBreeze(KMeans.scala:143) > at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:126) > at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:338) > at org.apache.spark.mllib.clustering.KMeans$.train(KMeans.scala:348) > at KMeansDataGenerator$.main(kmeans.scala:105) > at KMeansDataGenerator.main(kmeans.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:94) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55) > at java.lang.reflect.Method.invoke(Method.java:619) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org