[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...

2014-12-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3527


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...

2014-12-01 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/3527#issuecomment-65033421
  
Merging in master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...

2014-12-01 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/3527#issuecomment-65033393
  
Yea as @aarondav pointed out, I don't think akka framesize is going to be a 
problem anymore in 1.2+, regardless of the number of partitions. Still good to 
have this check to be defensive. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...

2014-11-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3527#issuecomment-65028820
  
  [Test build #23974 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23974/consoleFull)
 for   PR 3527 at commit 
[`0089c7a`](https://github.com/apache/spark/commit/0089c7abaf58c7c8d014d0e0d86b00efcee4e100).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...

2014-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3527#issuecomment-65028826
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23974/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...

2014-11-30 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/3527#issuecomment-65028666
  
> I believe it is only 1 bit, not byte, per block

Thank you for correcting me. Was not aware of `HighlyCompressedMapStatus`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...

2014-11-30 Thread aarondav
Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/3527#issuecomment-65028176
  
I believe it is only 1 bit, not byte, per block. Further I would estimate
compression on largely uniform data to be at least around 10x. So your
example would ideally only use around 1.2MB.

Anyway, you can arbitrarily multiply the number of partitions to
demonstrate the issue. 1mil by 1mil is still a tough cookie to crack, but
we don't really want users to have to meddle with frame sizes.

Having this check is fine, of course, whether or not users should have to
change it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...

2014-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3527#issuecomment-65027426
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23973/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...

2014-11-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3527#issuecomment-65027423
  
  [Test build #23973 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23973/consoleFull)
 for   PR 3527 at commit 
[`0089c7a`](https://github.com/apache/spark/commit/0089c7abaf58c7c8d014d0e0d86b00efcee4e100).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...

2014-11-30 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/3527#issuecomment-65027150
  
1 partitions doesn't sound that extreme to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...

2014-11-30 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/3527#issuecomment-65026993
  


> @zsxwing Note that the case you mentioned should no longer cause this 
issue either, as we use an extra compressed data structure when dealing with 
very large numbers of map partitions.

In extreme case, it's still possible. For example, assume that there are 
1 partitions in map side. If the user does not set a new `numPartition`, 
there will be 1 reducer. If all of these blocks are not 0, there will be 
huge `MapStatus`s: 1 * 1 * 1 = 100MB. I'm not sure what the compression 
ratio of `GZIPOutputStream` will be, but it may exceed `spark.akka.frameSize`.

Admittedly, this might be a user mistake and the user should set a proper 
`numPartition`.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...

2014-11-30 Thread aarondav
Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/3527#issuecomment-65025980
  
@zsxwing Note that the case you mentioned should no longer cause this issue 
either, as we use an extra compressed data structure when dealing with very 
large numbers of map partitions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...

2014-11-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3527#issuecomment-65024260
  
  [Test build #23974 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23974/consoleFull)
 for   PR 3527 at commit 
[`0089c7a`](https://github.com/apache/spark/commit/0089c7abaf58c7c8d014d0e0d86b00efcee4e100).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3527#issuecomment-65023903
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...

2014-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3527#issuecomment-65023301
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23971/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...

2014-11-30 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/3527#issuecomment-65023051
  
A potential usage of `spark.akka.frameSize` is when the size of 
`MapStatus`s exceeds `spark.akka.frameSize`, such as large number of mappers 
and reducers.

A relevant issue is in the following thread:

http://apache-spark-developers-list.1001551.n3.nabble.com/Eliminate-copy-while-sending-data-any-Akka-experts-here-td7127.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...

2014-11-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3527#issuecomment-65022875
  
  [Test build #23973 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23973/consoleFull)
 for   PR 3527 at commit 
[`0089c7a`](https://github.com/apache/spark/commit/0089c7abaf58c7c8d014d0e0d86b00efcee4e100).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...

2014-11-30 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3527#issuecomment-65022618
  
Nice catch.  I don't think that it's very common to set 
`spark.akka.frameSize` these days, since 1.1's task broadcasting should have 
addressed the most common causes of messages that exceeded the frame size, but 
it certainly doesn't hurt to warn / guard against bad inputs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4664][Core] Throw an exception when spa...

2014-11-30 Thread zsxwing
GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/3527

[SPARK-4664][Core] Throw an exception when spark.akka.frameSize > 2047

If `spark.akka.frameSize` > 2047, it will overflow and become negative. 
Should have some assertion in `maxFrameSizeBytes` to warn people.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark SPARK-4664

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3527.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3527


commit f12f0b6d323f1b7b7e62e24950122ae95c257050
Author: zsxwing 
Date:   2014-12-01T05:27:27Z

Throw an exception when spark.akka.frameSize > 2047




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org