[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...
Github user buryat commented on a diff in the pull request: https://github.com/apache/spark/pull/19266#discussion_r139616346 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -344,7 +344,7 @@ class Word2Vec extends Serializable with Logging { val newSentences = sentences.repartition(numPartitions).cache() val initRandom = new XORShiftRandom(seed) -if (vocabSize.toLong * vectorSize >= Int.MaxValue) { +if (vocabSize.toLong * vectorSize >= Int.MaxValue - 8) { --- End diff -- I think this should be just `>` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...
Github user buryat commented on a diff in the pull request: https://github.com/apache/spark/pull/19266#discussion_r139616411 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala --- @@ -304,8 +304,8 @@ class BlockMatrix @Since("1.3.0") ( s"Int.MaxValue. Currently numRows: ${numRows()}") require(numCols() < Int.MaxValue, "The number of columns of this matrix should be less than " + s"Int.MaxValue. Currently numCols: ${numCols()}") -require(numRows() * numCols() < Int.MaxValue, "The length of the values array must be " + - s"less than Int.MaxValue. Currently numRows * numCols: ${numRows() * numCols()}") +require(numRows() * numCols() < Int.MaxValue - 8, "The length of the values array must be " + --- End diff -- `<=` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...
Github user buryat commented on a diff in the pull request: https://github.com/apache/spark/pull/19266#discussion_r139616165 --- Diff: core/src/main/scala/org/apache/spark/util/collection/CompactBuffer.scala --- @@ -126,22 +126,20 @@ private[spark] class CompactBuffer[T: ClassTag] extends Seq[T] with Serializable /** Increase our size to newSize and grow the backing array if needed. */ private def growToSize(newSize: Int): Unit = { -if (newSize < 0) { - throw new UnsupportedOperationException("Can't grow buffer past Int.MaxValue elements") +val arrayMax = Int.MaxValue - 8 +if (newSize < 0 || newSize - 2 > arrayMax) { + throw new UnsupportedOperationException(s"Can't grow buffer past $arrayMax elements") } val capacity = if (otherElements != null) otherElements.length + 2 else 2 if (newSize > capacity) { - var newArrayLen = 8 + var newArrayLen = 8L while (newSize - 2 > newArrayLen) { --- End diff -- ah, I see that it's reserved, wasn't clear to me why `- 2`, seems like a magic number to me, I see it other places too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...
Github user buryat commented on a diff in the pull request: https://github.com/apache/spark/pull/19266#discussion_r139615624 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/array/LongArray.java --- @@ -39,7 +39,7 @@ private final long length; public LongArray(MemoryBlock memory) { -assert memory.size() < (long) Integer.MAX_VALUE * 8: "Array size > 4 billion elements"; +assert memory.size() < (long) Integer.MAX_VALUE * 8: "Array size > 2.1 billion elements"; --- End diff -- maybe also add the exact number in the error message instead of `2.1 billion` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...
Github user buryat commented on a diff in the pull request: https://github.com/apache/spark/pull/19266#discussion_r139615418 --- Diff: core/src/main/scala/org/apache/spark/util/collection/PartitionedPairBuffer.scala --- @@ -96,5 +96,5 @@ private[spark] class PartitionedPairBuffer[K, V](initialCapacity: Int = 64) } private object PartitionedPairBuffer { - val MAXIMUM_CAPACITY = Int.MaxValue / 2 // 2 ^ 30 - 1 + val MAXIMUM_CAPACITY = (Int.MaxValue - 8) / 2 --- End diff -- maybe worth adding the `Int` type even though it's already an Int. Also the comment at the line 28 should be changed to `1073741819` i.e. `- 8/2` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...
Github user buryat commented on a diff in the pull request: https://github.com/apache/spark/pull/19266#discussion_r139492024 --- Diff: core/src/main/scala/org/apache/spark/util/collection/CompactBuffer.scala --- @@ -126,22 +126,20 @@ private[spark] class CompactBuffer[T: ClassTag] extends Seq[T] with Serializable /** Increase our size to newSize and grow the backing array if needed. */ private def growToSize(newSize: Int): Unit = { -if (newSize < 0) { - throw new UnsupportedOperationException("Can't grow buffer past Int.MaxValue elements") +val arrayMax = Int.MaxValue - 8 +if (newSize < 0 || newSize - 2 > arrayMax) { + throw new UnsupportedOperationException(s"Can't grow buffer past $arrayMax elements") } val capacity = if (otherElements != null) otherElements.length + 2 else 2 if (newSize > capacity) { - var newArrayLen = 8 + var newArrayLen = 8L while (newSize - 2 > newArrayLen) { --- End diff -- I think we can remove `- 2` now since `newArrayLen` is a Long --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...
Github user buryat commented on a diff in the pull request: https://github.com/apache/spark/pull/19266#discussion_r139489658 --- Diff: core/src/main/java/org/apache/spark/unsafe/map/HashMapGrowthStrategy.java --- @@ -30,11 +30,15 @@ HashMapGrowthStrategy DOUBLING = new Doubling(); class Doubling implements HashMapGrowthStrategy { + +private static final int ARRAY_MAX = Integer.MAX_VALUE - 8; --- End diff -- Maybe worth adding a comment why this values is chosen as the max Like here http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/util/ArrayList.java#l223 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...
Github user buryat commented on a diff in the pull request: https://github.com/apache/spark/pull/19266#discussion_r139489491 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/array/LongArray.java --- @@ -39,7 +39,7 @@ private final long length; public LongArray(MemoryBlock memory) { -assert memory.size() < (long) Integer.MAX_VALUE * 8: "Array size > 4 billion elements"; +assert memory.size() < (long) Integer.MAX_VALUE * 8: "Array size > 2.1 billion elements"; --- End diff -- `assert memory.size() < (long) (Integer.MAX_VALUE - 8) * 8` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org