Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/12914#issuecomment-220219840 @jkbradley @yanboliang @holdenk @sethah let's discuss the issue of defaults in param doc (refer https://github.com/apache/spark/pull/13148#discussion_r63600571) on this PR since it is pertinent. Here, Holden raises 2 issues: 1. The Scaladoc contains default values for many params (sometimes in shared traits). In addition the Scala `Param` itself has the self-contained `doc` field (typically not containing defaults, since the built-in doc shows current and default in `explainParam`). 2. The PyDoc only contains the `Param` `doc` field. (By the way, (1) implies that in cases where the default param value in the trait is overridden, the Scaladoc is incorrect, but that is another issue). The result of (2) is that the HTML API doc doesn't look great, e.g. <img width="820" alt="screen shot 2016-05-19 at 4 56 08 am" src="https://cloud.githubusercontent.com/assets/1036807/15381231/0a937dde-1d7e-11e6-885c-b120679f84ee.png"> Also, nowhere in the PyDoc are the defaults listed, while in the Scaladoc they are. I agree that it would be nice to have the defaults listed in the PyDoc in some way. 1. One solution is the original approach here, where defaults are put in the Param doc in a standard way, but stripped out during `explainParams`. This works but IMO is more prone to breaking in future if people forget to do things in exactly the correct format. It also doesn't directly solve the problem of the API doc looking ugly; 2. Another solution is the current approach here, where the attributes are turned into properties with a docstring (possibly including the default) - this does solve the problem of nice display in the API doc. The downside here is the potentially fairly large change to make everything a property, and the code duplication introduced (though kept to a minimum) and extra boilerplate when adding new params that could be more error-prone; 3. A third solution is what I've done [here](https://github.com/mlnick/spark/tree/sphinx-doc-params) as a PoC, which basically adds the built-in doc as the instance docstring for each Python `Param`. Then we override the `AttributeDocumenter` in Sphinx to handle it. The result displays nicely in the API doc (the same as the property approach, but no defaults are added). The other thing that changes is the `__init__` docstring is brought back (for some reason the current docs are not showing that), which means that the defaults are essentially documented there for each class. In a way this seems more "Pythonic" to me (i.e. Python users are accustomed to seeing the default arg values in constructer doc, e.g. sciki-learn). 4. Another option is to do nothing (for now at least), except bring back the `__init__` docstring. This keeps the ugly-looking `Param` doc, but at least shows the default args for each class, and is the current behavior. We can do something like (1) or (3) later (but maybe not (2) during Spark 2.x as it may be too large a change). 5. A final option is to perhaps document defaults elsewhere (such as the setter for the param which is usually implemented in the class or a model trait in Scala). Let's decide on an approach and make it consistent across the board.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org