Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/12914#issuecomment-220219840
  
    @jkbradley @yanboliang @holdenk @sethah let's discuss the issue of defaults 
in param doc (refer 
https://github.com/apache/spark/pull/13148#discussion_r63600571) on this PR 
since it is pertinent.
    
    Here, Holden raises 2 issues:
    1. The Scaladoc contains default values for many params (sometimes in 
shared traits). In addition the Scala `Param` itself has the self-contained 
`doc` field (typically not containing defaults, since the built-in doc shows 
current and default in `explainParam`).
    2. The PyDoc only contains the `Param` `doc` field.
    
    (By the way, (1) implies that in cases where the default param value in the 
trait is overridden, the Scaladoc is incorrect, but that is another issue).
    
    The result of (2) is that the HTML API doc doesn't look great, e.g.
    
    <img width="820" alt="screen shot 2016-05-19 at 4 56 08 am" 
src="https://cloud.githubusercontent.com/assets/1036807/15381231/0a937dde-1d7e-11e6-885c-b120679f84ee.png";>
    
    Also, nowhere in the PyDoc are the defaults listed, while in the Scaladoc 
they are.
    
    I agree that it would be nice to have the defaults listed in the PyDoc in 
some way.
    1. One solution is the original approach here, where defaults are put in 
the Param doc in a standard way, but stripped out during `explainParams`. This 
works but IMO is more prone to breaking in future if people forget to do things 
in exactly the correct format. It also doesn't directly solve the problem of 
the API doc looking ugly; 
    2. Another solution is the current approach here, where the attributes are 
turned into properties with a docstring (possibly including the default) - this 
does solve the problem of nice display in the API doc. The downside here is the 
potentially fairly large change to make everything a property, and the code 
duplication introduced (though kept to a minimum) and extra boilerplate when 
adding new params that could be more error-prone;
    3. A third solution is what I've done 
[here](https://github.com/mlnick/spark/tree/sphinx-doc-params) as a PoC, which 
basically adds the built-in doc as the instance docstring for each Python 
`Param`. Then we override the `AttributeDocumenter` in Sphinx to handle it. The 
result displays nicely in the API doc (the same as the property approach, but 
no defaults are added). The other thing that changes is the `__init__` 
docstring is brought back (for some reason the current docs are not showing 
that), which means that the defaults are essentially documented there for each 
class. In a way this seems more "Pythonic" to me (i.e. Python users are 
accustomed to seeing the default arg values in constructer doc, e.g. 
sciki-learn).
    4. Another option is to do nothing (for now at least), except bring back 
the `__init__` docstring. This keeps the ugly-looking `Param` doc, but at least 
shows the default args for each class, and is the current behavior. We can do 
something like (1) or (3) later (but maybe not (2) during Spark 2.x as it may 
be too large a change).
    5. A final option is to perhaps document defaults elsewhere (such as the 
setter for the param which is usually implemented in the class or a model trait 
in Scala).
    
    Let's decide on an approach and make it consistent across the board.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to