[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-10-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2485


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-10-02 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57683434
  
I committed this. I missed there wasn't a jira here so filed 
https://issues.apache.org/jira/browse/SPARK-3768.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-10-02 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57705659
  
Thanks @tgravescs 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-10-01 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57518030
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-30 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57326448
  
@andrewor14  did you have any further comments on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-30 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57341483
  
I think this is fine. I spotted one semicolon but I'll let that go. LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-30 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57360651
  
Semicolon removed (nice catch)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56979002
  
It seems a bit much to have 2 configs to do essentially same thing.  I see 
it leading to confusion and just extra overhead. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56987453
  
Yes I'm inclined towards having only one config, and beefing up the 
documentation and comments on the existing one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57001251
  
@andrewor14  I had added a few comments in code/docs yday. Not sure if you 
got a chance to take a look. If there is anything specific (in terms of 
comments) you'd like to add, please suggest and I'll add it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57001360
  
Hey @nishkamravi2 yes I just looked at the latest changes and the existing 
comments are good. Any other comments by others?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57001461
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread brndnmtthws
Github user brndnmtthws commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57001671
  
Can you refactor this to be non-YARN specific?  It would be good to share 
code between this and #2401.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57001868
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20866/consoleFull)
 for   PR 2485 at commit 
[`8f76c8b`](https://github.com/apache/spark/commit/8f76c8b46379736aeb7dbe1a4d88729424a041f7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread brndnmtthws
Github user brndnmtthws commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57001899
  
In particular, look at how I put the logic into a common function, 
`calculateTotalMemory`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57002615
  
Not sure what you mean by non-yarn specific. The two code bases are quite 
different and memoryOverhead is too small of an intersection to try and unify 
them. I think what these two PRs should be sharing is consistent 
logic/philosophy. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57002734
  
Calculate totalMemory can be differently defined for the two code paths. 
The overhead percentage will have to be different too. As long as they follow 
the same semantics/logic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread brndnmtthws
Github user brndnmtthws commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57004118
  
Why can't they both share the same config parameters, for example?  I 
understand the implementation differences, but we shouldn't need to have 
distinct config params.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57004722
  
For one, it would mean a change in the UI, which breaks existing 
deployments and there should be a compelling reason to do so.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread brndnmtthws
Github user brndnmtthws commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57004940
  
So I guess there's nothing to do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57005579
  
I think PR #2401 can be modeled after this one. Instead of defining 
overhead as a percentage, it could (and probably should) be defined as an 
absolute value. Also, spark.executor.memory.overhead.minimum is redundant and 
add confusion/complexity for the developer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread brndnmtthws
Github user brndnmtthws commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57006489
  
Naturally you wouldn't want to have to change yours.

I'll drop the `.minimum` thing, and prefix the config params with `.mesos`, 
like you've done for yarn.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57007876
  
Hey I just talked to @pwendell about this. I think it's better for us to 
have a yarn config and a mesos config, but not generalize this to use a common 
`spark.executor.memory.overhead.*` config. The reason behind this is because 
this memory overhead doesn't make sense for standalone mode or other cluster 
managers that don't launch executors in containers. I think it's fine as long 
as the two yarn and mesos configs have the same semantics, so the user of one 
mode is not confused when they switch to another.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread brndnmtthws
Github user brndnmtthws commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57008082
  
That's fair. I'm updating the PR to make that Mesos specific now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57008999
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20866/consoleFull)
 for   PR 2485 at commit 
[`8f76c8b`](https://github.com/apache/spark/commit/8f76c8b46379736aeb7dbe1a4d88729424a041f7).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class IDF(val minDocFreq: Int) `
  * `  class DocumentFrequencyAggregator(val minDocFreq: Int) extends 
Serializable `
  * `class PStatsParam(AccumulatorParam):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57009006
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20866/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57009912
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57010357
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20870/consoleFull)
 for   PR 2485 at commit 
[`8f76c8b`](https://github.com/apache/spark/commit/8f76c8b46379736aeb7dbe1a4d88729424a041f7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57017843
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20870/consoleFull)
 for   PR 2485 at commit 
[`8f76c8b`](https://github.com/apache/spark/commit/8f76c8b46379736aeb7dbe1a4d88729424a041f7).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57017849
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20870/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57018879
  
Need some help interpreting the test results. Not clear which one is 
failing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57029756
  
It's the python ones. This is unlikely to be related to your patch. Let's 
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57030100
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20887/consoleFull)
 for   PR 2485 at commit 
[`8f76c8b`](https://github.com/apache/spark/commit/8f76c8b46379736aeb7dbe1a4d88729424a041f7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57034392
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20887/consoleFull)
 for   PR 2485 at commit 
[`8f76c8b`](https://github.com/apache/spark/commit/8f76c8b46379736aeb7dbe1a4d88729424a041f7).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-57034396
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20887/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-25 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56816028
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-25 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56818674
  
@JoshRosen  would you mind kicking jenkins again now that its upmerged?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56843859
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/153/consoleFull)
 for   PR 2485 at commit 
[`c726bd9`](https://github.com/apache/spark/commit/c726bd9f707ce182ec8d56ffecf9da87dcdb3091).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56853597
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/153/consoleFull)
 for   PR 2485 at commit 
[`c726bd9`](https://github.com/apache/spark/commit/c726bd9f707ce182ec8d56ffecf9da87dcdb3091).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-25 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2485#discussion_r18049289
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -64,14 +64,18 @@ private[spark] trait ClientBase extends Logging {
   smemory capability of the cluster ($maxMem MB per container))
 val executorMem = args.executorMemory + executorMemoryOverhead
 if (executorMem  maxMem) {
-  throw new IllegalArgumentException(sRequired executor memory 
($executorMem MB)  +
+  throw new IllegalArgumentException(sRequired executor memory 
($args.executorMemmory+$executorMemoryOverhead MB)  +
 sis above the max threshold ($maxMem MB) of this cluster!)
 }
 val amMem = args.amMemory + amMemoryOverhead
 if (amMem  maxMem) {
-  throw new IllegalArgumentException(sRequired AM memory ($amMem MB) 
 +
+  throw new IllegalArgumentException(sRequired AM memory 
($args.amMemory+$amMemoryOverhead MB)  +
--- End diff --

You need to put `{ }` around `args.amMemory`. This will print out the 
entire `args`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-25 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2485#discussion_r18049314
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -64,14 +64,18 @@ private[spark] trait ClientBase extends Logging {
   smemory capability of the cluster ($maxMem MB per container))
 val executorMem = args.executorMemory + executorMemoryOverhead
 if (executorMem  maxMem) {
-  throw new IllegalArgumentException(sRequired executor memory 
($executorMem MB)  +
+  throw new IllegalArgumentException(sRequired executor memory 
($args.executorMemmory+$executorMemoryOverhead MB)  +
--- End diff --

Line too long


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-25 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2485#discussion_r18049333
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -64,14 +64,18 @@ private[spark] trait ClientBase extends Logging {
   smemory capability of the cluster ($maxMem MB per container))
 val executorMem = args.executorMemory + executorMemoryOverhead
 if (executorMem  maxMem) {
-  throw new IllegalArgumentException(sRequired executor memory 
($executorMem MB)  +
+  throw new IllegalArgumentException(sRequired executor memory 
($args.executorMemmory+$executorMemoryOverhead MB)  +
 sis above the max threshold ($maxMem MB) of this cluster!)
 }
 val amMem = args.amMemory + amMemoryOverhead
 if (amMem  maxMem) {
-  throw new IllegalArgumentException(sRequired AM memory ($amMem MB) 
 +
+  throw new IllegalArgumentException(sRequired AM memory 
($args.amMemory+$amMemoryOverhead MB)  +
--- End diff --

Also this line is too long (100 chars max)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-25 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2485#discussion_r18049458
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala 
---
@@ -39,15 +39,19 @@ private[spark] class ClientArguments(args: 
Array[String], sparkConf: SparkConf)
   var appName: String = Spark
   var priority = 0
 
+  parseArgs(args.toList)
+  loadEnvironmentArgs()
+
   // Additional memory to allocate to containers
   // For now, use driver's memory overhead as our AM container's memory 
overhead
-  val amMemoryOverhead = sparkConf.getInt(
-spark.yarn.driver.memoryOverhead, 
YarnSparkHadoopUtil.DEFAULT_MEMORY_OVERHEAD)
-  val executorMemoryOverhead = sparkConf.getInt(
-spark.yarn.executor.memoryOverhead, 
YarnSparkHadoopUtil.DEFAULT_MEMORY_OVERHEAD)
+  val amMemoryOverhead = 
sparkConf.getInt(spark.yarn.driver.memoryOverhead, 
+math.max((YarnSparkHadoopUtil.MEMORY_OVERHEAD_FACTOR * 
amMemory).toInt, 
+YarnSparkHadoopUtil.MEMORY_OVERHEAD_MIN))
+
+  val executorMemoryOverhead = 
sparkConf.getInt(spark.yarn.executor.memoryOverhead, 
+math.max((YarnSparkHadoopUtil.MEMORY_OVERHEAD_FACTOR * 
executorMemory).toInt, 
+YarnSparkHadoopUtil.MEMORY_OVERHEAD_MIN))
--- End diff --

Hm, these will be much easier to read if you just import 
`YarnSparkHadoopUtil._` up top


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-25 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2485#discussion_r18049502
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -117,9 +118,10 @@ private[yarn] abstract class YarnAllocator(
 
 if (missing  0) {
   numPendingAllocate.addAndGet(missing)
-  logInfo(Will Allocate %d executor containers, each with %d 
memory.format(
+  logInfo(Will allocate %d executor containers, each with %d MB 
memory including %d MB overhead.format(
--- End diff --

line too long


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-25 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56863597
  
Have we ever come to a consensus on whether 0.07 is an appropriate default? 
Under the current settings this means anything above ~5.5G of executor / driver 
memory will use a higher memory overhead. How did we arrive at 0.07 in the 
first place?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-25 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56865310
  
@nishkamravi2 arrived at this through experimentation.  He had a few 
details on his experiments on the previous incarnation of this PR #1391 . If 
anything, I think 0.07 is conservative.  I would not be surprised if I counted 
and discovered we hit this issue with every customer of ours running a decently 
sized executor on YARN, and they normally need to increase beyond 0.07.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-25 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56875798
  
Updated as per @andrewor14 's comments. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-25 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56876371
  
As Sandy points out, 7% is on the conservative side. In the interest of 
minimizing memory waste while covering the common cases (as per our 
experiments). Anything between 6-10% should make for a good default value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-25 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2485#discussion_r18057379
  
--- Diff: out ---
@@ -0,0 +1 @@
+Already up-to-date.
--- End diff --

Wait, can you delete this file?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-25 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56878538
  
I see. Maybe it makes sense to at least add a comment (in the documentation 
and in the code) to explain how we arrived at these numbers. This config can't 
be configured so we should make sure we have a reasonable default value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-25 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56887022
  
@andrewor14 Which config are we referring to? spark.yarn.*.memoryOverhead 
is configurable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-25 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56889393
  
The static value (MB) is configurable, but the user can't specify 15% or 
20% but is stuck with 7%. This is probably fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-25 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56891420
  
I see. Yeah, we have the choice to expose the config parameter relative to 
the container size or as an absolute value. Since memoryOverhead as a function 
of the container size is bit of a simplification for the purpose of calculating 
default, I would rather not define it that way.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-25 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56892181
  
We could also expose both and make memoryOverhead override the other.  I 
think this could be reasonable because the scale is most likely to be set in 
spark-defaults.conf and the memoryOverhead would go on the command line for 
specific apps.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-25 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56893439
  
Not a bad suggestion Sandy, but I would be wary of the potential confusion 
it may create. Ideally this parameter should not be exposed as a config 
parameter at all in the spirit of a minimalistic UI (but that doesn't seem 
possible today).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-24 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56706929
  
@pwendell @mateiz @andrewor14  can any of you kick jenkins?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-24 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56707385
  
I just kicked it from the `spark-prs` parameterized build trigger; let's 
wait and see if it starts...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56707584
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/146/consoleFull)
 for   PR 2485 at commit 
[`f00fa31`](https://github.com/apache/spark/commit/f00fa311945c1eafa8957eae5c84719521761dcd).
 * This patch **does not** merge cleanly!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-24 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56707989
  
ah sorry, looks like something conflicts now and it needs upmerged.

@nishkamravi2  can you please upmerge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56720302
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/146/consoleFull)
 for   PR 2485 at commit 
[`f00fa31`](https://github.com/apache/spark/commit/f00fa311945c1eafa8957eae5c84719521761dcd).
 * This patch **passes** unit tests.
 * This patch **does not** merge cleanly!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-24 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56762805
  
Updated and merged. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-24 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56763002
  
ClientBase changes are now distributed over ClientBase and ClientArguments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-23 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56480298
  
This looks good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-23 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56526845
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-23 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56522491
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-23 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56532051
  
@JoshRosen  Any idea why Jenkins isn't running on this?  Could you kick it 
manually?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-22 Thread nishkamravi2
GitHub user nishkamravi2 opened a pull request:

https://github.com/apache/spark/pull/2485

Modify default YARN memory_overhead-- from an additive constant to a 
multiplier

Redone against the recent master branch 
(https://github.com/apache/spark/pull/1391)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nishkamravi2/spark master_nravi

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2485.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2485


commit 681b36f5fb63e14dc89e17813894227be9e2324f
Author: nravi nr...@c1704.halxg.cloudera.com
Date:   2014-05-08T07:05:33Z

Fix for SPARK-1758: failing test 
org.apache.spark.JavaAPISuite.wholeTextFiles

The prefix file: is missing in the string inserted as key in HashMap

commit 5108700230fd70b995e76598f49bdf328c971e77
Author: nravi nr...@c1704.halxg.cloudera.com
Date:   2014-06-03T22:25:22Z

Fix in Spark for the Concurrent thread modification issue (SPARK-1097, 
HADOOP-10456)

commit 6b840f017870207d23e75de224710971ada0b3d0
Author: nravi nr...@c1704.halxg.cloudera.com
Date:   2014-06-03T22:34:02Z

Undo the fix for SPARK-1758 (the problem is fixed)

commit df2aeb179fca4fc893803c72a657317f5b5539d7
Author: nravi nr...@c1704.halxg.cloudera.com
Date:   2014-06-09T19:02:59Z

Improved fix for ConcurrentModificationIssue (Spark-1097, Hadoop-10456)

commit eb663ca20c73f9c467192c95fc528c6f55f202be
Author: nravi nr...@c1704.halxg.cloudera.com
Date:   2014-06-09T19:04:39Z

Merge branch 'master' of https://github.com/apache/spark

commit 5423a03ddf4d747db7261d08a64e32f44e8be95e
Author: nravi nr...@c1704.halxg.cloudera.com
Date:   2014-06-10T20:06:07Z

Merge branch 'master' of https://github.com/apache/spark

commit 3bf8fad85813037504189cf1323d381fefb6dfbe
Author: nravi nr...@c1704.halxg.cloudera.com
Date:   2014-06-16T05:47:00Z

Merge branch 'master' of https://github.com/apache/spark

commit 2b630f94079b82df3ebae2b26a3743112afcd526
Author: nravi nr...@c1704.halxg.cloudera.com
Date:   2014-06-16T06:00:31Z

Accept memory input as 30g, 512M instead of an int value, to be 
consistent with rest of Spark

commit efd688a4e15b79e92d162073035b03362fcf66f0
Author: Nishkam Ravi nr...@cloudera.com
Date:   2014-07-13T00:04:17Z

Merge branch 'master' of https://github.com/apache/spark

commit 2e69f112d1be59951cd32da4127d8b51bfa03338
Author: Nishkam Ravi nr...@cloudera.com
Date:   2014-09-21T23:17:15Z

Merge branch 'master' of https://github.com/apache/spark into master_nravi

commit ebcde10252e6c45169ea086e8426ec9997d46490
Author: Nishkam Ravi nr...@cloudera.com
Date:   2014-09-22T06:44:40Z

Modify default YARN memory_overhead-- from an additive constant to a 
multiplier (redone to resolve merge conflicts)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56334768
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-22 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-56334843
  
Have redone the PR against the recent master branch, which has undergone 
significant structural changes for Yarn. Addressed review comments and changed 
the multiplier back to 0.07 (to err on the conservative side, since customers 
are running into this issue).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-22 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-56342496
  
If #2485 is the replacement, can we close this one out?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-22 Thread sryza
Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/2485#discussion_r17837060
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -117,9 +118,10 @@ private[yarn] abstract class YarnAllocator(
 
 if (missing  0) {
   numPendingAllocate.addAndGet(missing)
-  logInfo(Will Allocate %d executor containers, each with %d 
memory.format(
+  logInfo(Will Allocate %d executor containers, each with %d+%d MB 
memory.format(
--- End diff --

This quantities in this message may be unclear to those not familiar with 
the overhead.  Maybe something like each with %d memory including %d overhead?

Also, not the fault of this PR, but Allocate shouldn't be capitalized.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-22 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56343225
  
 It would also be nice to log what it is when we fail to get a container 
large enough or it fails due to the cluster max allocation limit was hit.

@tgravescs I believe we already print out a nasty error message when a 
container can't be allocated because of the max allocation limit.  Are you 
saying we should indicate whether the overhead made the difference?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-22 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56345871
  
Updated as per @sryza 's comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-22 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-56346539
  
Shall we let this linger on for just a bit until the other one gets merged?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-22 Thread nishkamravi2
Github user nishkamravi2 closed the pull request at:

https://github.com/apache/spark/pull/1391


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-22 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-56347057
  
Noticed that we have a reference to this one in 2485, closing it out. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-22 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56371027
  
yes it would be nice to tell the user what the overhead limit is calculated 
to be as I might not realize there is overhead and that its dependent upon the 
multiplier.ie I told it to use 15GB, why is it erroring saying max size is 
16GB.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-22 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/2485#issuecomment-56457534
  
Updated as per @tgravescs 's comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-19 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-56142506
  
@sryza Thanks Sandy.  Will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-19 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-56174217
  
@mridulm  any comments?

I'm ok with it if its a consistent problem for users.  One thing we 
definitely need to do is document it and possibly look at including better log 
and error messages. We should at least log the size of the overhead it 
calculates.  It would also be nice to log what it is when we fail to get a 
container large enough or it fails due to the cluster max allocation limit was 
hit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-18 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/1391#discussion_r17762675
  
--- Diff: 
yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala
 ---
@@ -92,7 +92,7 @@ private[yarn] class YarnAllocationHandler(
 
   // Additional memory overhead - in mb.
   private def memoryOverhead: Int = 
sparkConf.getInt(spark.yarn.executor.memoryOverhead,
-YarnAllocationHandler.MEMORY_OVERHEAD)
+math.max((YarnAllocationHandler.MEMORY_OVERHEAD_FACTOR * 
executorMemory).toInt, YarnAllocationHandler.MEMORY_OVERHEAD_MIN))
--- End diff --

line too long, here and other places


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-18 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-56119506
  
What is the current state of this PR? @tgravescs @mridulm any more thoughts 
about the current approach? This is a related PR for mesos and I'm wondering if 
we can use the same approach in both places.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-18 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-56120931
  
Updated as per @andrewor14 's comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-18 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-56132524
  
@nishkamravi2 mind resolving the merge conflicts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-18 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-56132497
  
These changes look good to me.  This addresses what continues to be the #1 
issue that we see in Cloudera customer YARN deployments.  It's worth 
considering boosting this when using PySpark, but that's probably work for 
another JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-54694595
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-18 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-49480064
  
I'll let mridul comment on this but I think adding a comment where 0.06 
came from would be useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-18 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-49483642
  
6% was experimentally obtained (with the goal of keeping the bound as tight 
as possible without the containers crashing). Three workloads were experimented 
with: PageRank, WordCount and KMeans over moderate to large input datasets and 
configured such that the containers are optimally utilized (neither 
under-utilized nor over-subscribed). Based on my observations, less than 5% is 
a no-no. If someone would like to tune this parameter more and make a case for 
a higher value (keeping in mind that this is a default value that will not 
cover all workloads), that would be helpful. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-17 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-49348179
  
Bringing the discussion back online. Thanks for all the input so far. 

Ran a few experiments yday and today. Number of executors (which was the 
other main handle we wanted to factor in) doesn't seem to have any noticeable 
impact. Tried a few other parameters such as num_partitions, 
default_parallelism but nothing sticks. Confirmed the proportionality with 
container size. Have also been trying to tune the multiplier to minimize 
potential waste and I think 6% (as opposed to 7% as we currently have) is the 
lowest we should go. Modifying the PR accordingly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-13 Thread mridulm
Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-48835312
  
We have gone over this in the past .. it is suboptimal to make it a linear
function of executor/driver memory.
Overhead is a function of number of executors, number of opened files,
shuffle vm pressure, etc.
It is NOT a function of executor memory : which is why it is separately
configured.
 On 13-Jul-2014 11:16 am, UCB AMPLab notificati...@github.com wrote:

 Can one of the admins verify this patch?

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/1391#issuecomment-48832590.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-13 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-48835447
  
That makes sense, but then it doesn't explain why a constant amount works 
for a given job when executor memory is low, and then doesn't work when it is 
high. This has also been my experience and I don't have a great grasp on why it 
would be. More threads and open files in a busy executor? It goes indirectly 
with how big you need your executor to be, but not directly.

Nishkam do you have a sense of how much extra memory you had to configure 
to get it to work when executor memory increased? is it pretty marginal, or 
quite substantial?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-13 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-48835560
  
Sean, the memory_overhead is fairly substantial. More than 2GB for a 30GB 
executor. Less than 400MB for a 2GB executor. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-13 Thread mridulm
Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-48835566
  
The default constant is actually a lowerbound to account for other
overheads (since yarn will aggressively kill tasks)... Unfortunately we
have not sized this properly : and don't have good recommendation on how to
set it.

This is compounded by magic constants in spark for various IO ops, non
deterministic network behaviour (we should be able to estimate upper bound
here = 2x number of workers), vm memory use (shuffle output is mmapp'ed
whole ... going foul with yarn virtual men limits) and so on.

Hence sizing this is, unfortunately, app specific.
 On 13-Jul-2014 2:34 pm, Sean Owen notificati...@github.com wrote:

 That makes sense, but then it doesn't explain why a constant amount works
 for a given job when executor memory is low, and then doesn't work when it
 is high. This has also been my experience and I don't have a great grasp 
on
 why it would be. More threads and open files in a busy executor? It goes
 indirectly with how big you need your executor to be, but not directly.

 Nishkam do you have a sense of how much extra memory you had to configure
 to get it to work when executor memory increased? is it pretty marginal, 
or
 quite substantial?

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/1391#issuecomment-48835447.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-13 Thread mridulm
Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-48835618
  
That would be a function of your jobs.
Other apps would have a drastically different characteristics ... Which is
why we can't generalize to a simple fraction of executor memory.
It actually buys us nothing in general case ... Jobs will continue to fail
when it is incorrect : while wasting a lot of memory
On 13-Jul-2014 2:38 pm, nishkamravi2 notificati...@github.com wrote:

 Yes, I'm aware of the discussion on this issue in the past. Experiments
 confirm that overhead is a function of executor memory. Why and how can be
 figured out with due diligence and analysis. It may be a function of other
 parameters and the function may be fairly complex. However, the
 proportionality is undeniable. Besides, we are only adjusting the default
 value and making it a bit more resilient. The memory_overhead parameter 
can
 still be configured by the developer separately. The constant additive
 factor makes little sense (empirically).

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/1391#issuecomment-48835500.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-13 Thread mridulm
Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-48835656
  
The basic issue is you are trying to model overhead using the wrong
variable... It has no correlation on executor memory actually (other than
vm overheads as heap increases)
On 13-Jul-2014 2:44 pm, Mridul Muralidharan mri...@gmail.com wrote:

 That would be a function of your jobs.
 Other apps would have a drastically different characteristics ... Which is
 why we can't generalize to a simple fraction of executor memory.
 It actually buys us nothing in general case ... Jobs will continue to fail
 when it is incorrect : while wasting a lot of memory
 On 13-Jul-2014 2:38 pm, nishkamravi2 notificati...@github.com wrote:

 Yes, I'm aware of the discussion on this issue in the past. Experiments
 confirm that overhead is a function of executor memory. Why and how can 
be
 figured out with due diligence and analysis. It may be a function of 
other
 parameters and the function may be fairly complex. However, the
 proportionality is undeniable. Besides, we are only adjusting the default
 value and making it a bit more resilient. The memory_overhead parameter 
can
 still be configured by the developer separately. The constant additive
 factor makes little sense (empirically).

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/1391#issuecomment-48835500.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-13 Thread Mridul Muralidharan
You are lucky :-) for some of our jobs, in a 8gb container, overhead is
1.8gb !
On 13-Jul-2014 2:40 pm, nishkamravi2 g...@git.apache.org wrote:

 Github user nishkamravi2 commented on the pull request:

 https://github.com/apache/spark/pull/1391#issuecomment-48835560

 Sean, the memory_overhead is fairly substantial. More than 2GB for a
 30GB executor. Less than 400MB for a 2GB executor.


 ---
 If your project is set up for it, you can reply to this email and have your
 reply appear on GitHub as well. If your project does not have this feature
 enabled and wishes so, or if the feature is enabled but not working, please
 contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
 with INFRA.
 ---



[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-13 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-48835727
  
Yes of course, lots of settings' best or even usable values are ultimately 
app-specific. Ideally, defaults work for lots of cases. A flat value is the 
simplest of models, and anecdotally, the current default value does not work in 
medium- to large-memory YARN jobs. You can increase the default, but then the 
overhead gets silly for small jobs -- 1GB? And all of these are not-uncommon 
use cases.

None of that implies the overhead logically scales with container memory. 
Empirically, it may do, and that's useful. Until the magic explanatory variable 
is found, which one is less problematic for end users -- a flat constant that 
frequently has to be tuned, or an imperfect model that could get it right in 
more cases? 

That said it is kind of a developer API change and feels like something to 
not keep reimagining.

Niskham can you share any anecdotal evidence about how the overhead 
changes. If executor memory is the only variable changing, that seems to be 
evidence against it being driven by other factors. but I don't know if that's 
what we know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-13 Thread mridulm
Github user mridulm commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-48835769
  
You are lucky :-) for some of our jobs, in a 8gb container, overhead is
1.8gb !
On 13-Jul-2014 2:41 pm, nishkamravi2 notificati...@github.com wrote:

 Sean, the memory_overhead is fairly substantial. More than 2GB for a 30GB
 executor. Less than 400MB for a 2GB executor.

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/1391#issuecomment-48835560.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-13 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-48835852
  
Experimented with three different workloads and noticed common patterns of 
proportionality. 
Other parameters were left unchanged and only executor size was increased. 
The memory-overhead ranges between 0.05-0.08 * executor_memory size. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-13 Thread nishkamravi2
Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-48835881
  
That's why the parameter is configurable. If you have jobs that cause 
20-25% memory_overhead, default values will not help. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   >