Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/12571#issuecomment-215521675 I understand the argument of we want the best user experience and I'm not against the settings themselves, I just think the benefit isn't worth the cost here. These are very specific advanced java options and properly maintaining and parsing them to me is not a necessary thing. For instance when java 9,10,11 come out and the options no longer exist or change we have to go change code, if ibm java comes out with different config we have to change, if someone thinks 80% is better then 90% we have to change. We already have enough PRs. Let the user/admins configure it for their version of java and specific needs. We are adding a bunch of code to parse these and set them to a default that someone thinks is better. Many others might disagree. For instance with MapReduce we run it at 50% to fail fast. Why not set spark to that? if we want it to fail fast 50% is better then 90, right? Why don't we set the garbage collector as well? To me this all comes down to configuring what is best for your specific application. Since Spark can do so many different things - streaming, ML, graph processing, ETL, having one default isn't necessarily best for all. I think putting this in sets a bad precedence and just adds maintenance headache for not much benefit. @vanzin mentions he has never seen anyone set this, so is it that big of a deal? Where is the data that says 90% is better then 98% for the majority of Spark users. Obviously if things just don't run like you mention with the max perm size, that makes it a much easier call and it makes sense to put it in, but I don't see that here. Many of my customers don't set it and things are fine. I see other users set it because they explicitly want to fail very fast and its less then 90%. I also think setting XX:GCHeapFreeLimit is more risky then setting GCTimeLimit. I personally have never seen anyone actually set this. its defined as "The lower limit on the amount of space freed during a garbage collection in percent of the maximum heap (default is 2)" This to me is much more application specific then the GC time limit.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org