Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/12571#issuecomment-215521675
  
    I understand the argument of we want the best user experience and I'm not 
against the settings themselves, I just think the benefit isn't worth the cost 
here. 
    
    These are very specific advanced java options and properly maintaining and 
parsing them to me is not a necessary thing.  For instance when java 9,10,11 
come out and the options no longer exist or change we have to go change code, 
if ibm java comes out with different config we have to change, if someone 
thinks 80% is better then 90% we have to change.  We already have enough PRs.
    
    Let the user/admins configure it for their version of java and specific 
needs.  We are adding a bunch of code to parse these and set them to a default 
that someone thinks is better.  Many others might disagree. For instance with 
MapReduce we run it at 50% to fail fast.   Why not set spark to that?  if we 
want it to fail fast 50% is better then 90, right? Why don't we set the garbage 
collector as well?   To me this all comes down to configuring what is best for 
your specific application.  Since Spark can do so many different things - 
streaming, ML, graph processing, ETL, having one default isn't necessarily best 
for all. 
    
    I think putting this in sets a bad precedence and just adds maintenance 
headache for not much benefit.  @vanzin mentions he has never seen anyone set 
this, so is it that big of a deal?  Where is the data that says 90% is better 
then 98% for the majority of Spark users.  Obviously if things just don't run 
like you mention with the max perm size, that makes it a much easier call and 
it makes sense to put it in, but I don't see that here.
    Many of my customers don't set it and things are fine. I see other users 
set it because they explicitly want to fail very fast and its less then 90%.
      
    I also think setting XX:GCHeapFreeLimit is more risky then setting 
GCTimeLimit. I personally have never seen anyone actually set this.  its 
defined as "The lower limit on the amount of space freed during a garbage 
collection in percent of the maximum heap (default is 2)" This to me is much 
more application specific then the GC time limit.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to