[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220672478 @mengxr Disable this test in master and 2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220652565 @davies @rxin It seems that this PR caused OOO in master builds. ~~~ *** RUN ABORTED *** java.lang.OutOfMemoryError: Java heap space at org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.init(HashedRelation.scala:417) at org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.(HashedRelation.scala:423) at org.apache.spark.sql.execution.joins.LongHashedRelation$.apply(HashedRelation.scala:792) at org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$6.apply$mcV$sp(HashedRelationSuite.scala:227) at org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$6.apply(HashedRelationSuite.scala:216) at org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$6.apply(HashedRelationSuite.scala:216) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) ~~~ https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.2/1066/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13182 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220416710 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220413819 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58877/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220413814 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220413524 **[Test build #58877 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58877/consoleFull)** for PR 13182 at commit [`3ab5c13`](https://github.com/apache/spark/commit/3ab5c1348418fe849a35f41946243754ff715814). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220392230 **[Test build #58877 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58877/consoleFull)** for PR 13182 at commit [`3ab5c13`](https://github.com/apache/spark/commit/3ab5c1348418fe849a35f41946243754ff715814). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13182#discussion_r63822575 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -410,9 +410,10 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap private def init(): Unit = { if (mm != null) { + require(capacity < (512 << 20), "Cannot broadcast more than 512 millions rows") --- End diff -- Looks like it is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13182#discussion_r63822450 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -410,9 +410,10 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap private def init(): Unit = { if (mm != null) { + require(capacity < (512 << 20), "Cannot broadcast more than 512 millions rows") --- End diff -- yes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13182#discussion_r63822349 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -410,9 +410,10 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap private def init(): Unit = { if (mm != null) { + require(capacity < (512 << 20), "Cannot broadcast more than 512 millions rows") --- End diff -- Is `capacity` number of row? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13182#discussion_r63815948 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -72,9 +72,18 @@ case class BroadcastExchangeExec( val beforeCollect = System.nanoTime() // Note that we use .executeCollect() because we don't want to convert data to Scala types val input: Array[InternalRow] = child.executeCollect() +if (input.length >= (512 << 20)) { + throw new SparkException( +s"Cannot broadcast the table with more than 512 millions rows: ${input.length} rows") --- End diff -- Yes, it's not, will update them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220204711 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/13182#discussion_r63810701 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -72,9 +72,18 @@ case class BroadcastExchangeExec( val beforeCollect = System.nanoTime() // Note that we use .executeCollect() because we don't want to convert data to Scala types val input: Array[InternalRow] = child.executeCollect() +if (input.length >= (512 << 20)) { + throw new SparkException( +s"Cannot broadcast the table with more than 512 millions rows: ${input.length} rows") --- End diff -- I think it'd be good to make these 2 consistent (either use 512 << 20 or 51200 at both places) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220198917 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58824/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220198916 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220198799 **[Test build #58824 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58824/consoleFull)** for PR 13182 at commit [`8714022`](https://github.com/apache/spark/commit/8714022c2654a6bcb428aee5a6b07169296d0664). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220186165 **[Test build #58824 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58824/consoleFull)** for PR 13182 at commit [`8714022`](https://github.com/apache/spark/commit/8714022c2654a6bcb428aee5a6b07169296d0664). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220183818 **[Test build #58820 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58820/consoleFull)** for PR 13182 at commit [`1b5c8e1`](https://github.com/apache/spark/commit/1b5c8e1b976ed17c002c8138e47ccf1f249d5d90). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220183824 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58820/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220183821 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220183697 cc @sameeragarwal --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220183556 **[Test build #58820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58820/consoleFull)** for PR 13182 at commit [`1b5c8e1`](https://github.com/apache/spark/commit/1b5c8e1b976ed17c002c8138e47ccf1f249d5d90). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/13182#discussion_r63798134 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -72,9 +72,18 @@ case class BroadcastExchangeExec( val beforeCollect = System.nanoTime() // Note that we use .executeCollect() because we don't want to convert data to Scala types val input: Array[InternalRow] = child.executeCollect() +if (input.length >= (512 << 20)) { + throw new SparkException( +s"Cannot broadcast the table with more than 512 millions rows: ${input.length} rows") --- End diff -- this is technically not 512 million isn't it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220181917 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220181911 **[Test build #58819 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58819/consoleFull)** for PR 13182 at commit [`07d64c1`](https://github.com/apache/spark/commit/07d64c1d8dd27be32886478b775e8ef2e309e5c2). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220181919 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58819/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220181647 **[Test build #58819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58819/consoleFull)** for PR 13182 at commit [`07d64c1`](https://github.com/apache/spark/commit/07d64c1d8dd27be32886478b775e8ef2e309e5c2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/13182 [SPARK-15390] fix broadcast with 100 millions rows ## What changes were proposed in this pull request? When broadcast a table with more than 100 millions rows (should not ideally), the size of needed memory will overflow. This PR fix the overflow by converting it to Long when calculating the size of memory. Also add more checking in broadcast to show reasonable messages. ## How was this patch tested? Add test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark fix_broadcast Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13182.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13182 commit 07d64c1d8dd27be32886478b775e8ef2e309e5c2 Author: Davies LiuDate: 2016-05-18T22:41:19Z fix broadcast with 100m rows --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org