[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...

2017-10-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19460#discussion_r143640733
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java
 ---
@@ -36,9 +37,7 @@
  */
 public class BufferHolder {
 
-  // Some JVMs can't allocate arrays of length Integer.MAX_VALUE; actual 
max is somewhat
-  // smaller. Be conservative and lower the cap a little.
-  private static final int ARRAY_MAX = Integer.MAX_VALUE - 8;
+  private static final int ARRAY_MAX = 
ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH;
--- End diff --

Ok, these constants became redundant. No big deal. We don't need to rush 
this though


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...

2017-10-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19460


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...

2017-10-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19460#discussion_r143571046
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java 
---
@@ -40,6 +40,15 @@ public static int roundNumberOfBytesToNearestWord(int 
numBytes) {
 }
   }
 
+  // Some JVMs can't allocate arrays of length Integer.MAX_VALUE; actual 
max is somewhat smaller.
+  // Be conservative and lower the cap a little.
+  public static int MAX_ARRAY_LENGTH = Integer.MAX_VALUE - 8;
--- End diff --

I get declaring a constant though it doesn't just pertain to byte arrays. I 
couldn't find a good place for it. I don't know if its reused so much that it 
needs this


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...

2017-10-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19460#discussion_r143572060
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSuite.scala
 ---
@@ -37,3 +43,54 @@ class BufferHolderSuite extends SparkFunSuite {
 assert(e.getMessage.contains("exceeds size limitation"))
   }
 }
+
+// A test for growing the buffer holder to nearly 2GB. Due to the heap 
size limitation of the Spark
+// unit tests JVM, the actually test code is running as a submit job.
+class BufferHolderSparkSubmitSuite
--- End diff --

Separate file?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...

2017-10-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19460#discussion_r143570916
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java 
---
@@ -40,6 +40,15 @@ public static int roundNumberOfBytesToNearestWord(int 
numBytes) {
 }
   }
 
+  // Some JVMs can't allocate arrays of length Integer.MAX_VALUE; actual 
max is somewhat smaller.
+  // Be conservative and lower the cap a little.
+  public static int MAX_ARRAY_LENGTH = Integer.MAX_VALUE - 8;
+
+  // Use this value if the allocated byte arrays are used to store other 
types rather than bytes.
+  public static int maxWordRoundedArrayLength() {
+return roundNumberOfBytesToNearestWord(MAX_ARRAY_LENGTH - 8);
--- End diff --

Rounding won't do; it has to round _down_. I see that's why you wrote -8 
here but it's a little confusing as -7 works more directly. And then, why 
bother? just write -15 in the constant above and explain. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...

2017-10-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19460#discussion_r143564830
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java 
---
@@ -40,6 +40,15 @@ public static int roundNumberOfBytesToNearestWord(int 
numBytes) {
 }
   }
 
+  // Some JVMs can't allocate arrays of length Integer.MAX_VALUE; actual 
max is somewhat smaller.
+  // Be conservative and lower the cap a little.
+  public static int MAX_ARRAY_LENGTH = Integer.MAX_VALUE - 8;
--- End diff --

`- 15`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...

2017-10-09 Thread liufengdb
GitHub user liufengdb opened a pull request:

https://github.com/apache/spark/pull/19460

[SPARK-2][core] Fix the ARRAY_MAX in BufferHolder and add a test

## What changes were proposed in this pull request?

We should not break the assumption that the length of the allocated byte 
array is word rounded:  

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java#L170
So we want to use `Integer.MAX_VALUE - 15` instead of `Integer.MAX_VALUE - 
8` as the upper bound of an allocated byte array. 

## How was this patch tested?

Since the Spark unit test JVM has less than 1GB heap, here we run the test 
code as a submit job, so it can run on a JVM has 4GB memory. 

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liufengdb/spark fix_array_max

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19460.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19460


commit 92a6d2d53aea02042d47888e99df5a4f2167cd1f
Author: Feng Liu 
Date:   2017-10-09T17:43:39Z

init




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org