[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19460#discussion_r143640733 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java --- @@ -36,9 +37,7 @@ */ public class BufferHolder { - // Some JVMs can't allocate arrays of length Integer.MAX_VALUE; actual max is somewhat - // smaller. Be conservative and lower the cap a little. - private static final int ARRAY_MAX = Integer.MAX_VALUE - 8; + private static final int ARRAY_MAX = ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH; --- End diff -- Ok, these constants became redundant. No big deal. We don't need to rush this though --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19460 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19460#discussion_r143571046 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java --- @@ -40,6 +40,15 @@ public static int roundNumberOfBytesToNearestWord(int numBytes) { } } + // Some JVMs can't allocate arrays of length Integer.MAX_VALUE; actual max is somewhat smaller. + // Be conservative and lower the cap a little. + public static int MAX_ARRAY_LENGTH = Integer.MAX_VALUE - 8; --- End diff -- I get declaring a constant though it doesn't just pertain to byte arrays. I couldn't find a good place for it. I don't know if its reused so much that it needs this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19460#discussion_r143572060 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolderSuite.scala --- @@ -37,3 +43,54 @@ class BufferHolderSuite extends SparkFunSuite { assert(e.getMessage.contains("exceeds size limitation")) } } + +// A test for growing the buffer holder to nearly 2GB. Due to the heap size limitation of the Spark +// unit tests JVM, the actually test code is running as a submit job. +class BufferHolderSparkSubmitSuite --- End diff -- Separate file? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19460#discussion_r143570916 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java --- @@ -40,6 +40,15 @@ public static int roundNumberOfBytesToNearestWord(int numBytes) { } } + // Some JVMs can't allocate arrays of length Integer.MAX_VALUE; actual max is somewhat smaller. + // Be conservative and lower the cap a little. + public static int MAX_ARRAY_LENGTH = Integer.MAX_VALUE - 8; + + // Use this value if the allocated byte arrays are used to store other types rather than bytes. + public static int maxWordRoundedArrayLength() { +return roundNumberOfBytesToNearestWord(MAX_ARRAY_LENGTH - 8); --- End diff -- Rounding won't do; it has to round _down_. I see that's why you wrote -8 here but it's a little confusing as -7 works more directly. And then, why bother? just write -15 in the constant above and explain. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19460#discussion_r143564830 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java --- @@ -40,6 +40,15 @@ public static int roundNumberOfBytesToNearestWord(int numBytes) { } } + // Some JVMs can't allocate arrays of length Integer.MAX_VALUE; actual max is somewhat smaller. + // Be conservative and lower the cap a little. + public static int MAX_ARRAY_LENGTH = Integer.MAX_VALUE - 8; --- End diff -- `- 15` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19460: [SPARK-22222][core] Fix the ARRAY_MAX in BufferHo...
GitHub user liufengdb opened a pull request: https://github.com/apache/spark/pull/19460 [SPARK-2][core] Fix the ARRAY_MAX in BufferHolder and add a test ## What changes were proposed in this pull request? We should not break the assumption that the length of the allocated byte array is word rounded: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java#L170 So we want to use `Integer.MAX_VALUE - 15` instead of `Integer.MAX_VALUE - 8` as the upper bound of an allocated byte array. ## How was this patch tested? Since the Spark unit test JVM has less than 1GB heap, here we run the test code as a submit job, so it can run on a JVM has 4GB memory. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/liufengdb/spark fix_array_max Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19460.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19460 commit 92a6d2d53aea02042d47888e99df5a4f2167cd1f Author: Feng Liu Date: 2017-10-09T17:43:39Z init --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org