[ 
https://issues.apache.org/jira/browse/SPARK-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated SPARK-11219:
---------------------------------
    Description: 
There are several different formats for describing params in PySpark.MLlib, 
making it unclear what the preferred way to document is, i.e. vertical 
alignment vs single line.

This is to agree on a format and make it consistent across PySpark.MLlib.

Following the discussion in SPARK-10560, using 2 lines with an indentation is 
both readable and doesn't lead to changing many lines when adding/removing 
parameters.  If the parameter uses a default value, put this in parenthesis in 
a new line under the description.

Example:
{noformat}
:param stepSize:
  Step size for each iteration of gradient descent.
  (default: 0.1)
:param numIterations:
  Number of iterations run for each batch of data.
  (default: 50)
{noformat}

h2. Current State of Parameter Description Formating

h4. Classification
  * LogisticRegressionModel - single line descriptions, fix indentations
  * LogisticRegressionWithSGD - vertical alignment, sporatic default values
  * LogisticRegressionWithLBFGS - vertical alignment, sporatic default values
  * SVMModel - single line
  * SVMWithSGD - vertical alignment, sporatic default values
  * NaiveBayesModel - single line
  * NaiveBayes - single line

h4. Clustering
  * KMeansModel - missing param description
  * KMeans - missing param description and defaults
  * GaussianMixture - vertical align, incorrect default formatting
  * PowerIterationClustering - single line with wrapped indentation, missing 
defaults
  * StreamingKMeansModel - single line wrapped
  * StreamingKMeans - single line wrapped, missing defaults
  * LDAModel - single line
  * LDA - vertical align, mising some defaults

h4. FPM  
  * FPGrowth - single line
  * PrefixSpan - single line, defaults values in backticks

h4. Recommendation
  * ALS - does not have param descriptions

h4. Regression
  * LabeledPoint - single line
  * LinearModel - single line
  * LinearRegressionWithSGD - vertical alignment
  * RidgeRegressionWithSGD - vertical align
  * IsotonicRegressionModel - single line
  * IsotonicRegression - single line, missing default

h4. Tree
  * DecisionTree - single line with vertical indentation, missing defaults
  * RandomForest - single line with wrapped indent, missing some defaults
  * GradientBoostedTrees - single line with wrapped indent

NOTE
This issue will just focus on model/algorithm descriptions, which are the 
largest source of inconsistent formatting
evaluation.py, feature.py, random.py, utils.py - these supporting classes have 
param descriptions as single line, but are consistent so don't need to be 
changed

  was:
There are several different formats for describing params in PySpark.MLlib, 
making it unclear what the preferred way to document is, i.e. vertical 
alignment vs single line.

This is to agree on a format and make it consistent across PySpark.MLlib.

Following the discussion in SPARK-10560, using 2 lines with an indentation is 
both readable and doesn't lead to changing many lines when adding/removing 
parameters.  If the parameter uses a default value, put this in parenthesis in 
a new line under the description.

Example:
{noformat}
:param stepSize:
  Step size for each iteration of gradient descent.
  (default: 0.1)
:param numIterations:
  Number of iterations run for each batch of data.
  (default: 50)
{noformat}


> Make Parameter Description Format Consistent in PySpark.MLlib
> -------------------------------------------------------------
>
>                 Key: SPARK-11219
>                 URL: https://issues.apache.org/jira/browse/SPARK-11219
>             Project: Spark
>          Issue Type: Documentation
>          Components: Documentation, MLlib, PySpark
>            Reporter: Bryan Cutler
>            Priority: Trivial
>
> There are several different formats for describing params in PySpark.MLlib, 
> making it unclear what the preferred way to document is, i.e. vertical 
> alignment vs single line.
> This is to agree on a format and make it consistent across PySpark.MLlib.
> Following the discussion in SPARK-10560, using 2 lines with an indentation is 
> both readable and doesn't lead to changing many lines when adding/removing 
> parameters.  If the parameter uses a default value, put this in parenthesis 
> in a new line under the description.
> Example:
> {noformat}
> :param stepSize:
>   Step size for each iteration of gradient descent.
>   (default: 0.1)
> :param numIterations:
>   Number of iterations run for each batch of data.
>   (default: 50)
> {noformat}
> h2. Current State of Parameter Description Formating
> h4. Classification
>   * LogisticRegressionModel - single line descriptions, fix indentations
>   * LogisticRegressionWithSGD - vertical alignment, sporatic default values
>   * LogisticRegressionWithLBFGS - vertical alignment, sporatic default values
>   * SVMModel - single line
>   * SVMWithSGD - vertical alignment, sporatic default values
>   * NaiveBayesModel - single line
>   * NaiveBayes - single line
> h4. Clustering
>   * KMeansModel - missing param description
>   * KMeans - missing param description and defaults
>   * GaussianMixture - vertical align, incorrect default formatting
>   * PowerIterationClustering - single line with wrapped indentation, missing 
> defaults
>   * StreamingKMeansModel - single line wrapped
>   * StreamingKMeans - single line wrapped, missing defaults
>   * LDAModel - single line
>   * LDA - vertical align, mising some defaults
> h4. FPM  
>   * FPGrowth - single line
>   * PrefixSpan - single line, defaults values in backticks
> h4. Recommendation
>   * ALS - does not have param descriptions
> h4. Regression
>   * LabeledPoint - single line
>   * LinearModel - single line
>   * LinearRegressionWithSGD - vertical alignment
>   * RidgeRegressionWithSGD - vertical align
>   * IsotonicRegressionModel - single line
>   * IsotonicRegression - single line, missing default
> h4. Tree
>   * DecisionTree - single line with vertical indentation, missing defaults
>   * RandomForest - single line with wrapped indent, missing some defaults
>   * GradientBoostedTrees - single line with wrapped indent
> NOTE
> This issue will just focus on model/algorithm descriptions, which are the 
> largest source of inconsistent formatting
> evaluation.py, feature.py, random.py, utils.py - these supporting classes 
> have param descriptions as single line, but are consistent so don't need to 
> be changed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to