[jira] [Commented] (SPARK-1989) Exit executors faster if they get into a cycle of heavy GC

2016-04-28 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262761#comment-15262761
 ] 

Thomas Graves commented on SPARK-1989:
--

Personally I don't agree with this and think we should close this as won't fix.

There is some discussion on the pr above and here was my last response.  If 
others disagree please let me know.



I understand the argument of we want the best user experience and I'm not 
against the settings themselves, I just think the benefit isn't worth the cost 
here.

These are very specific advanced java options and properly maintaining and 
parsing them to me is not a necessary thing. For instance when java 9,10,11 
come out and the options no longer exist or change we have to go change code, 
if ibm java comes out with different config we have to change, if someone 
thinks 80% is better then 90% we have to change. We already have enough PRs.

Let the user/admins configure it for their version of java and specific needs. 
We are adding a bunch of code to parse these and set them to a default that 
someone thinks is better. Many others might disagree. For instance with 
MapReduce we run it at 50% to fail fast. Why not set spark to that? if we want 
it to fail fast 50% is better then 90, right? Why don't we set the garbage 
collector as well? To me this all comes down to configuring what is best for 
your specific application. Since Spark can do so many different things - 
streaming, ML, graph processing, ETL, having one default isn't necessarily best 
for all.

I think putting this in sets a bad precedence and just adds maintenance 
headache for not much benefit. @vanzin mentions he has never seen anyone set 
this, so is it that big of a deal? Where is the data that says 90% is better 
then 98% for the majority of Spark users. Obviously if things just don't run 
like you mention with the max perm size, that makes it a much easier call and 
it makes sense to put it in, but I don't see that here.
Many of my customers don't set it and things are fine. I see other users set it 
because they explicitly want to fail very fast and its less then 90%.

I also think setting XX:GCHeapFreeLimit is more risky then setting GCTimeLimit. 
I personally have never seen anyone actually set this. its defined as "The 
lower limit on the amount of space freed during a garbage collection in percent 
of the maximum heap (default is 2)" This to me is much more application 
specific then the GC time limit.

> Exit executors faster if they get into a cycle of heavy GC
> --
>
> Key: SPARK-1989
> URL: https://issues.apache.org/jira/browse/SPARK-1989
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Matei Zaharia
>
> I've seen situations where an application is allocating too much memory 
> across its tasks + cache to proceed, but Java gets into a cycle where it 
> repeatedly runs full GCs, frees up a bit of the heap, and continues instead 
> of giving up. This then leads to timeouts and confusing error messages. It 
> would be better to crash with OOM sooner. The JVM has options to support 
> this: http://java.dzone.com/articles/tracking-excessive-garbage.
> The right solution would probably be:
> - Add some config options used by spark-submit to set XX:GCTimeLimit and 
> XX:GCHeapFreeLimit, with more conservative values than the defaults (e.g. 90% 
> time limit, 5% free limit)
> - Make sure we pass these into the Java options for executors in each 
> deployment mode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1989) Exit executors faster if they get into a cycle of heavy GC

2016-04-21 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251613#comment-15251613
 ] 

Apache Spark commented on SPARK-1989:
-

User 'devaraj-kavali' has created a pull request for this issue:
https://github.com/apache/spark/pull/12571

> Exit executors faster if they get into a cycle of heavy GC
> --
>
> Key: SPARK-1989
> URL: https://issues.apache.org/jira/browse/SPARK-1989
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Reporter: Matei Zaharia
>
> I've seen situations where an application is allocating too much memory 
> across its tasks + cache to proceed, but Java gets into a cycle where it 
> repeatedly runs full GCs, frees up a bit of the heap, and continues instead 
> of giving up. This then leads to timeouts and confusing error messages. It 
> would be better to crash with OOM sooner. The JVM has options to support 
> this: http://java.dzone.com/articles/tracking-excessive-garbage.
> The right solution would probably be:
> - Add some config options used by spark-submit to set XX:GCTimeLimit and 
> XX:GCHeapFreeLimit, with more conservative values than the defaults (e.g. 90% 
> time limit, 5% free limit)
> - Make sure we pass these into the Java options for executors in each 
> deployment mode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1989) Exit executors faster if they get into a cycle of heavy GC

2014-07-02 Thread Guoqiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050005#comment-14050005
 ] 

Guoqiang Li commented on SPARK-1989:


In this case should also triggers the driver garbage collection.
The related work: 
https://github.com/witgo/spark/compare/taskEvent

 Exit executors faster if they get into a cycle of heavy GC
 --

 Key: SPARK-1989
 URL: https://issues.apache.org/jira/browse/SPARK-1989
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Reporter: Matei Zaharia
 Fix For: 1.1.0


 I've seen situations where an application is allocating too much memory 
 across its tasks + cache to proceed, but Java gets into a cycle where it 
 repeatedly runs full GCs, frees up a bit of the heap, and continues instead 
 of giving up. This then leads to timeouts and confusing error messages. It 
 would be better to crash with OOM sooner. The JVM has options to support 
 this: http://java.dzone.com/articles/tracking-excessive-garbage.
 The right solution would probably be:
 - Add some config options used by spark-submit to set XX:GCTimeLimit and 
 XX:GCHeapFreeLimit, with more conservative values than the defaults (e.g. 90% 
 time limit, 5% free limit)
 - Make sure we pass these into the Java options for executors in each 
 deployment mode



--
This message was sent by Atlassian JIRA
(v6.2#6252)