[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-20 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/13182#issuecomment-220672478
  
@mengxr Disable this test in master and 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-20 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/13182#issuecomment-220652565
  
@davies @rxin It seems that this PR caused OOO in master builds. 

~~~
*** RUN ABORTED ***
  java.lang.OutOfMemoryError: Java heap space
  at 
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.init(HashedRelation.scala:417)
  at 
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.(HashedRelation.scala:423)
  at 
org.apache.spark.sql.execution.joins.LongHashedRelation$.apply(HashedRelation.scala:792)
  at 
org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$6.apply$mcV$sp(HashedRelationSuite.scala:227)
  at 
org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$6.apply(HashedRelationSuite.scala:216)
  at 
org.apache.spark.sql.execution.joins.HashedRelationSuite$$anonfun$6.apply(HashedRelationSuite.scala:216)
  at 
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)
~~~


https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.2/1066/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13182


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-19 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/13182#issuecomment-220416710
  
Merging this into master and 2.0, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13182#issuecomment-220413819
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58877/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13182#issuecomment-220413814
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13182#issuecomment-220413524
  
**[Test build #58877 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58877/consoleFull)**
 for PR 13182 at commit 
[`3ab5c13`](https://github.com/apache/spark/commit/3ab5c1348418fe849a35f41946243754ff715814).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13182#issuecomment-220392230
  
**[Test build #58877 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58877/consoleFull)**
 for PR 13182 at commit 
[`3ab5c13`](https://github.com/apache/spark/commit/3ab5c1348418fe849a35f41946243754ff715814).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13182#discussion_r63822575
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -410,9 +410,10 @@ private[execution] final class LongToUnsafeRowMap(val 
mm: TaskMemoryManager, cap
 
   private def init(): Unit = {
 if (mm != null) {
+  require(capacity < (512 << 20), "Cannot broadcast more than 512 
millions rows")
--- End diff --

Looks like it is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/13182#discussion_r63822450
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -410,9 +410,10 @@ private[execution] final class LongToUnsafeRowMap(val 
mm: TaskMemoryManager, cap
 
   private def init(): Unit = {
 if (mm != null) {
+  require(capacity < (512 << 20), "Cannot broadcast more than 512 
millions rows")
--- End diff --

yes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13182#discussion_r63822349
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -410,9 +410,10 @@ private[execution] final class LongToUnsafeRowMap(val 
mm: TaskMemoryManager, cap
 
   private def init(): Unit = {
 if (mm != null) {
+  require(capacity < (512 << 20), "Cannot broadcast more than 512 
millions rows")
--- End diff --

Is `capacity` number of row?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/13182#discussion_r63815948
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
 ---
@@ -72,9 +72,18 @@ case class BroadcastExchangeExec(
 val beforeCollect = System.nanoTime()
 // Note that we use .executeCollect() because we don't want to 
convert data to Scala types
 val input: Array[InternalRow] = child.executeCollect()
+if (input.length >= (512 << 20)) {
+  throw new SparkException(
+s"Cannot broadcast the table with more than 512 millions rows: 
${input.length} rows")
--- End diff --

Yes, it's not, will update them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread sameeragarwal
Github user sameeragarwal commented on the pull request:

https://github.com/apache/spark/pull/13182#issuecomment-220204711
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request:

https://github.com/apache/spark/pull/13182#discussion_r63810701
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
 ---
@@ -72,9 +72,18 @@ case class BroadcastExchangeExec(
 val beforeCollect = System.nanoTime()
 // Note that we use .executeCollect() because we don't want to 
convert data to Scala types
 val input: Array[InternalRow] = child.executeCollect()
+if (input.length >= (512 << 20)) {
+  throw new SparkException(
+s"Cannot broadcast the table with more than 512 millions rows: 
${input.length} rows")
--- End diff --

I think it'd be good to make these 2 consistent (either use 512 << 20 or 
51200 at both places)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13182#issuecomment-220198917
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58824/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13182#issuecomment-220198916
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13182#issuecomment-220198799
  
**[Test build #58824 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58824/consoleFull)**
 for PR 13182 at commit 
[`8714022`](https://github.com/apache/spark/commit/8714022c2654a6bcb428aee5a6b07169296d0664).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13182#issuecomment-220186165
  
**[Test build #58824 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58824/consoleFull)**
 for PR 13182 at commit 
[`8714022`](https://github.com/apache/spark/commit/8714022c2654a6bcb428aee5a6b07169296d0664).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13182#issuecomment-220183818
  
**[Test build #58820 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58820/consoleFull)**
 for PR 13182 at commit 
[`1b5c8e1`](https://github.com/apache/spark/commit/1b5c8e1b976ed17c002c8138e47ccf1f249d5d90).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13182#issuecomment-220183824
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58820/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13182#issuecomment-220183821
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13182#issuecomment-220183697
  
cc @sameeragarwal 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13182#issuecomment-220183556
  
**[Test build #58820 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58820/consoleFull)**
 for PR 13182 at commit 
[`1b5c8e1`](https://github.com/apache/spark/commit/1b5c8e1b976ed17c002c8138e47ccf1f249d5d90).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13182#discussion_r63798134
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
 ---
@@ -72,9 +72,18 @@ case class BroadcastExchangeExec(
 val beforeCollect = System.nanoTime()
 // Note that we use .executeCollect() because we don't want to 
convert data to Scala types
 val input: Array[InternalRow] = child.executeCollect()
+if (input.length >= (512 << 20)) {
+  throw new SparkException(
+s"Cannot broadcast the table with more than 512 millions rows: 
${input.length} rows")
--- End diff --

this is technically not 512 million isn't it?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13182#issuecomment-220181917
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13182#issuecomment-220181911
  
**[Test build #58819 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58819/consoleFull)**
 for PR 13182 at commit 
[`07d64c1`](https://github.com/apache/spark/commit/07d64c1d8dd27be32886478b775e8ef2e309e5c2).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13182#issuecomment-220181919
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58819/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13182#issuecomment-220181647
  
**[Test build #58819 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58819/consoleFull)**
 for PR 13182 at commit 
[`07d64c1`](https://github.com/apache/spark/commit/07d64c1d8dd27be32886478b775e8ef2e309e5c2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread davies
GitHub user davies opened a pull request:

https://github.com/apache/spark/pull/13182

[SPARK-15390] fix broadcast with 100 millions rows

## What changes were proposed in this pull request?

When broadcast a table with more than 100 millions rows (should not 
ideally), the size of needed memory will overflow.

This PR fix the overflow by converting it to Long when calculating the size 
of memory.

Also add more checking in broadcast to show reasonable messages.

## How was this patch tested?

Add test.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davies/spark fix_broadcast

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13182.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13182


commit 07d64c1d8dd27be32886478b775e8ef2e309e5c2
Author: Davies Liu 
Date:   2016-05-18T22:41:19Z

fix broadcast with 100m rows




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org