[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...

2017-09-19 Thread buryat
Github user buryat commented on a diff in the pull request:

https://github.com/apache/spark/pull/19266#discussion_r139616346
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ---
@@ -344,7 +344,7 @@ class Word2Vec extends Serializable with Logging {
 val newSentences = sentences.repartition(numPartitions).cache()
 val initRandom = new XORShiftRandom(seed)
 
-if (vocabSize.toLong * vectorSize >= Int.MaxValue) {
+if (vocabSize.toLong * vectorSize >= Int.MaxValue - 8) {
--- End diff --

I think this should be just `>`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...

2017-09-19 Thread buryat
Github user buryat commented on a diff in the pull request:

https://github.com/apache/spark/pull/19266#discussion_r139616411
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -304,8 +304,8 @@ class BlockMatrix @Since("1.3.0") (
   s"Int.MaxValue. Currently numRows: ${numRows()}")
 require(numCols() < Int.MaxValue, "The number of columns of this 
matrix should be less than " +
   s"Int.MaxValue. Currently numCols: ${numCols()}")
-require(numRows() * numCols() < Int.MaxValue, "The length of the 
values array must be " +
-  s"less than Int.MaxValue. Currently numRows * numCols: ${numRows() * 
numCols()}")
+require(numRows() * numCols() < Int.MaxValue - 8, "The length of the 
values array must be " +
--- End diff --

`<=`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...

2017-09-19 Thread buryat
Github user buryat commented on a diff in the pull request:

https://github.com/apache/spark/pull/19266#discussion_r139616165
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/collection/CompactBuffer.scala ---
@@ -126,22 +126,20 @@ private[spark] class CompactBuffer[T: ClassTag] 
extends Seq[T] with Serializable
 
   /** Increase our size to newSize and grow the backing array if needed. */
   private def growToSize(newSize: Int): Unit = {
-if (newSize < 0) {
-  throw new UnsupportedOperationException("Can't grow buffer past 
Int.MaxValue elements")
+val arrayMax = Int.MaxValue - 8
+if (newSize < 0 || newSize - 2 > arrayMax) {
+  throw new UnsupportedOperationException(s"Can't grow buffer past 
$arrayMax elements")
 }
 val capacity = if (otherElements != null) otherElements.length + 2 
else 2
 if (newSize > capacity) {
-  var newArrayLen = 8
+  var newArrayLen = 8L
   while (newSize - 2 > newArrayLen) {
--- End diff --

ah, I see that it's reserved, wasn't clear to me why `- 2`, seems like a 
magic number to me, I see it other places too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...

2017-09-19 Thread buryat
Github user buryat commented on a diff in the pull request:

https://github.com/apache/spark/pull/19266#discussion_r139615624
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/array/LongArray.java ---
@@ -39,7 +39,7 @@
   private final long length;
 
   public LongArray(MemoryBlock memory) {
-assert memory.size() < (long) Integer.MAX_VALUE * 8: "Array size > 4 
billion elements";
+assert memory.size() < (long) Integer.MAX_VALUE * 8: "Array size > 2.1 
billion elements";
--- End diff --

maybe also add the exact number in the error message instead of `2.1 
billion`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...

2017-09-19 Thread buryat
Github user buryat commented on a diff in the pull request:

https://github.com/apache/spark/pull/19266#discussion_r139615418
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/collection/PartitionedPairBuffer.scala
 ---
@@ -96,5 +96,5 @@ private[spark] class PartitionedPairBuffer[K, 
V](initialCapacity: Int = 64)
 }
 
 private object PartitionedPairBuffer {
-  val MAXIMUM_CAPACITY = Int.MaxValue / 2 // 2 ^ 30 - 1
+  val MAXIMUM_CAPACITY = (Int.MaxValue - 8) / 2
--- End diff --

maybe worth adding the `Int` type even though it's already an Int.

Also the comment at the line 28 should be changed to `1073741819` i.e. `- 
8/2`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...

2017-09-18 Thread buryat
Github user buryat commented on a diff in the pull request:

https://github.com/apache/spark/pull/19266#discussion_r139492024
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/collection/CompactBuffer.scala ---
@@ -126,22 +126,20 @@ private[spark] class CompactBuffer[T: ClassTag] 
extends Seq[T] with Serializable
 
   /** Increase our size to newSize and grow the backing array if needed. */
   private def growToSize(newSize: Int): Unit = {
-if (newSize < 0) {
-  throw new UnsupportedOperationException("Can't grow buffer past 
Int.MaxValue elements")
+val arrayMax = Int.MaxValue - 8
+if (newSize < 0 || newSize - 2 > arrayMax) {
+  throw new UnsupportedOperationException(s"Can't grow buffer past 
$arrayMax elements")
 }
 val capacity = if (otherElements != null) otherElements.length + 2 
else 2
 if (newSize > capacity) {
-  var newArrayLen = 8
+  var newArrayLen = 8L
   while (newSize - 2 > newArrayLen) {
--- End diff --

I think we can remove `- 2` now since `newArrayLen` is a Long


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...

2017-09-18 Thread buryat
Github user buryat commented on a diff in the pull request:

https://github.com/apache/spark/pull/19266#discussion_r139489658
  
--- Diff: 
core/src/main/java/org/apache/spark/unsafe/map/HashMapGrowthStrategy.java ---
@@ -30,11 +30,15 @@
   HashMapGrowthStrategy DOUBLING = new Doubling();
 
   class Doubling implements HashMapGrowthStrategy {
+
+private static final int ARRAY_MAX = Integer.MAX_VALUE - 8;
--- End diff --

Maybe worth adding a comment why this values is chosen as the max
Like here

http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/util/ArrayList.java#l223


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19266: [SPARK-22033][CORE] BufferHolder, other size chec...

2017-09-18 Thread buryat
Github user buryat commented on a diff in the pull request:

https://github.com/apache/spark/pull/19266#discussion_r139489491
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/array/LongArray.java ---
@@ -39,7 +39,7 @@
   private final long length;
 
   public LongArray(MemoryBlock memory) {
-assert memory.size() < (long) Integer.MAX_VALUE * 8: "Array size > 4 
billion elements";
+assert memory.size() < (long) Integer.MAX_VALUE * 8: "Array size > 2.1 
billion elements";
--- End diff --

`assert memory.size() < (long) (Integer.MAX_VALUE - 8) * 8`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org