Repository: spark
Updated Branches:
  refs/heads/branch-2.1 42777b1b3 -> 536a21593


[SPARK-18480][DOCS] Fix wrong links for ML guide docs

## What changes were proposed in this pull request?
1, There are two `[Graph.partitionBy]` in `graphx-programming-guide.md`, the 
first one had no effert.
2, `DataFrame`, `Transformer`, `Pipeline` and `Parameter`  in `ml-pipeline.md` 
were linked to `ml-guide.html` by mistake.
3, `PythonMLLibAPI` in `mllib-linear-methods.md` was not accessable, because 
class `PythonMLLibAPI` is private.
4, Other link updates.
## How was this patch tested?
 manual tests

Author: Zheng RuiFeng <ruife...@foxmail.com>

Closes #15912 from zhengruifeng/md_fix.

(cherry picked from commit cdaf4ce9fe58c4606be8aa2a5c3756d30545c850)
Signed-off-by: Sean Owen <so...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/536a2159
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/536a2159
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/536a2159

Branch: refs/heads/branch-2.1
Commit: 536a2159393c82d414cc46797c8bfd958f453d33
Parents: 42777b1
Author: Zheng RuiFeng <ruife...@foxmail.com>
Authored: Thu Nov 17 13:40:16 2016 +0000
Committer: Sean Owen <so...@cloudera.com>
Committed: Thu Nov 17 13:40:24 2016 +0000

----------------------------------------------------------------------
 docs/graphx-programming-guide.md                        |  1 -
 docs/ml-classification-regression.md                    |  4 ++--
 docs/ml-features.md                                     |  2 +-
 docs/ml-pipeline.md                                     | 12 ++++++------
 docs/mllib-linear-methods.md                            |  4 +---
 .../main/scala/org/apache/spark/ml/feature/LSH.scala    |  2 +-
 .../spark/ml/tree/impl/GradientBoostedTrees.scala       |  8 ++++----
 .../org/apache/spark/ml/tree/impl/RandomForest.scala    |  8 ++++----
 8 files changed, 19 insertions(+), 22 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/536a2159/docs/graphx-programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md
index 1097cf1..e271b28 100644
--- a/docs/graphx-programming-guide.md
+++ b/docs/graphx-programming-guide.md
@@ -36,7 +36,6 @@ description: GraphX graph processing library guide for Spark 
SPARK_VERSION_SHORT
 [Graph.fromEdgeTuples]: 
api/scala/index.html#org.apache.spark.graphx.Graph$@fromEdgeTuples[VD](RDD[(VertexId,VertexId)],VD,Option[PartitionStrategy])(ClassTag[VD]):Graph[VD,Int]
 [Graph.fromEdges]: 
api/scala/index.html#org.apache.spark.graphx.Graph$@fromEdges[VD,ED](RDD[Edge[ED]],VD)(ClassTag[VD],ClassTag[ED]):Graph[VD,ED]
 [PartitionStrategy]: 
api/scala/index.html#org.apache.spark.graphx.PartitionStrategy
-[Graph.partitionBy]: 
api/scala/index.html#org.apache.spark.graphx.Graph$@partitionBy(partitionStrategy:org.apache.spark.graphx.PartitionStrategy):org.apache.spark.graphx.Graph[VD,ED]
 [PageRank]: api/scala/index.html#org.apache.spark.graphx.lib.PageRank$
 [ConnectedComponents]: 
api/scala/index.html#org.apache.spark.graphx.lib.ConnectedComponents$
 [TriangleCount]: 
api/scala/index.html#org.apache.spark.graphx.lib.TriangleCount$

http://git-wip-us.apache.org/repos/asf/spark/blob/536a2159/docs/ml-classification-regression.md
----------------------------------------------------------------------
diff --git a/docs/ml-classification-regression.md 
b/docs/ml-classification-regression.md
index cb2ccbf..c72c01f 100644
--- a/docs/ml-classification-regression.md
+++ b/docs/ml-classification-regression.md
@@ -984,7 +984,7 @@ Random forests combine many decision trees in order to 
reduce the risk of overfi
 The `spark.ml` implementation supports random forests for binary and 
multiclass classification and for regression,
 using both continuous and categorical features.
 
-For more information on the algorithm itself, please see the [`spark.mllib` 
documentation on random forests](mllib-ensembles.html).
+For more information on the algorithm itself, please see the [`spark.mllib` 
documentation on random forests](mllib-ensembles.html#random-forests).
 
 ### Inputs and Outputs
 
@@ -1065,7 +1065,7 @@ GBTs iteratively train decision trees in order to 
minimize a loss function.
 The `spark.ml` implementation supports GBTs for binary classification and for 
regression,
 using both continuous and categorical features.
 
-For more information on the algorithm itself, please see the [`spark.mllib` 
documentation on GBTs](mllib-ensembles.html).
+For more information on the algorithm itself, please see the [`spark.mllib` 
documentation on GBTs](mllib-ensembles.html#gradient-boosted-trees-gbts).
 
 ### Inputs and Outputs
 

http://git-wip-us.apache.org/repos/asf/spark/blob/536a2159/docs/ml-features.md
----------------------------------------------------------------------
diff --git a/docs/ml-features.md b/docs/ml-features.md
index 9031772..45724a3 100644
--- a/docs/ml-features.md
+++ b/docs/ml-features.md
@@ -694,7 +694,7 @@ for more details on the API.
 `VectorIndexer` helps index categorical features in datasets of `Vector`s.
 It can both automatically decide which features are categorical and convert 
original values to category indices.  Specifically, it does the following:
 
-1. Take an input column of type 
[Vector](api/scala/index.html#org.apache.spark.mllib.linalg.Vector) and a 
parameter `maxCategories`.
+1. Take an input column of type 
[Vector](api/scala/index.html#org.apache.spark.ml.linalg.Vector) and a 
parameter `maxCategories`.
 2. Decide which features should be categorical based on the number of distinct 
values, where features with at most `maxCategories` are declared categorical.
 3. Compute 0-based category indices for each categorical feature.
 4. Index categorical features and transform original feature values to indices.

http://git-wip-us.apache.org/repos/asf/spark/blob/536a2159/docs/ml-pipeline.md
----------------------------------------------------------------------
diff --git a/docs/ml-pipeline.md b/docs/ml-pipeline.md
index b4d6be9..0384513 100644
--- a/docs/ml-pipeline.md
+++ b/docs/ml-pipeline.md
@@ -38,26 +38,26 @@ algorithms into a single pipeline, or workflow.
 This section covers the key concepts introduced by the Pipelines API, where 
the pipeline concept is
 mostly inspired by the [scikit-learn](http://scikit-learn.org/) project.
 
-* **[`DataFrame`](ml-guide.html#dataframe)**: This ML API uses `DataFrame` 
from Spark SQL as an ML
+* **[`DataFrame`](ml-pipeline.html#dataframe)**: This ML API uses `DataFrame` 
from Spark SQL as an ML
   dataset, which can hold a variety of data types.
   E.g., a `DataFrame` could have different columns storing text, feature 
vectors, true labels, and predictions.
 
-* **[`Transformer`](ml-guide.html#transformers)**: A `Transformer` is an 
algorithm which can transform one `DataFrame` into another `DataFrame`.
+* **[`Transformer`](ml-pipeline.html#transformers)**: A `Transformer` is an 
algorithm which can transform one `DataFrame` into another `DataFrame`.
 E.g., an ML model is a `Transformer` which transforms a `DataFrame` with 
features into a `DataFrame` with predictions.
 
-* **[`Estimator`](ml-guide.html#estimators)**: An `Estimator` is an algorithm 
which can be fit on a `DataFrame` to produce a `Transformer`.
+* **[`Estimator`](ml-pipeline.html#estimators)**: An `Estimator` is an 
algorithm which can be fit on a `DataFrame` to produce a `Transformer`.
 E.g., a learning algorithm is an `Estimator` which trains on a `DataFrame` and 
produces a model.
 
-* **[`Pipeline`](ml-guide.html#pipeline)**: A `Pipeline` chains multiple 
`Transformer`s and `Estimator`s together to specify an ML workflow.
+* **[`Pipeline`](ml-pipeline.html#pipeline)**: A `Pipeline` chains multiple 
`Transformer`s and `Estimator`s together to specify an ML workflow.
 
-* **[`Parameter`](ml-guide.html#parameters)**: All `Transformer`s and 
`Estimator`s now share a common API for specifying parameters.
+* **[`Parameter`](ml-pipeline.html#parameters)**: All `Transformer`s and 
`Estimator`s now share a common API for specifying parameters.
 
 ## DataFrame
 
 Machine learning can be applied to a wide variety of data types, such as 
vectors, text, images, and structured data.
 This API adopts the `DataFrame` from Spark SQL in order to support a variety 
of data types.
 
-`DataFrame` supports many basic and structured types; see the [Spark SQL 
datatype reference](sql-programming-guide.html#spark-sql-datatype-reference) 
for a list of supported types.
+`DataFrame` supports many basic and structured types; see the [Spark SQL 
datatype reference](sql-programming-guide.html#data-types) for a list of 
supported types.
 In addition to the types listed in the Spark SQL guide, `DataFrame` can use ML 
[`Vector`](mllib-data-types.html#local-vector) types.
 
 A `DataFrame` can be created either implicitly or explicitly from a regular 
`RDD`.  See the code examples below and the [Spark SQL programming 
guide](sql-programming-guide.html) for examples.

http://git-wip-us.apache.org/repos/asf/spark/blob/536a2159/docs/mllib-linear-methods.md
----------------------------------------------------------------------
diff --git a/docs/mllib-linear-methods.md b/docs/mllib-linear-methods.md
index 816bdf1..3085539 100644
--- a/docs/mllib-linear-methods.md
+++ b/docs/mllib-linear-methods.md
@@ -139,7 +139,7 @@ and logistic regression.
 Linear SVMs supports only binary classification, while logistic regression 
supports both binary and
 multiclass classification problems.
 For both methods, `spark.mllib` supports L1 and L2 regularized variants.
-The training data set is represented by an RDD of 
[LabeledPoint](mllib-data-types.html) in MLlib,
+The training data set is represented by an RDD of 
[LabeledPoint](mllib-data-types.html#labeled-point) in MLlib,
 where labels are class indices starting from zero: $0, 1, 2, \ldots$.
 
 ### Linear Support Vector Machines (SVMs)
@@ -491,5 +491,3 @@ Algorithms are all implemented in Scala:
 * 
[RidgeRegressionWithSGD](api/scala/index.html#org.apache.spark.mllib.regression.RidgeRegressionWithSGD)
 * 
[LassoWithSGD](api/scala/index.html#org.apache.spark.mllib.regression.LassoWithSGD)
 
-Python calls the Scala implementation via
-[PythonMLLibAPI](api/scala/index.html#org.apache.spark.mllib.api.python.PythonMLLibAPI).

http://git-wip-us.apache.org/repos/asf/spark/blob/536a2159/mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala
----------------------------------------------------------------------
diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala 
b/mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala
index 333a8c3..eb117c4 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala
@@ -40,7 +40,7 @@ private[ml] trait LSHParams extends HasInputCol with 
HasOutputCol {
    * @group param
    */
   final val outputDim: IntParam = new IntParam(this, "outputDim", "output 
dimension, where" +
-    "increasing dimensionality lowers the false negative rate, and decreasing 
dimensionality" +
+    " increasing dimensionality lowers the false negative rate, and decreasing 
dimensionality" +
     " improves the running performance", ParamValidators.gt(0))
 
   /** @group getParam */

http://git-wip-us.apache.org/repos/asf/spark/blob/536a2159/mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala
----------------------------------------------------------------------
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala 
b/mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala
index 7bef899..ede0a06 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala
@@ -34,7 +34,7 @@ private[spark] object GradientBoostedTrees extends Logging {
 
   /**
    * Method to train a gradient boosting model
-   * @param input Training dataset: RDD of 
[[org.apache.spark.mllib.regression.LabeledPoint]].
+   * @param input Training dataset: RDD of [[LabeledPoint]].
    * @param seed Random seed.
    * @return tuple of ensemble models and weights:
    *         (array of decision tree models, array of model weights)
@@ -59,7 +59,7 @@ private[spark] object GradientBoostedTrees extends Logging {
 
   /**
    * Method to validate a gradient boosting model
-   * @param input Training dataset: RDD of 
[[org.apache.spark.mllib.regression.LabeledPoint]].
+   * @param input Training dataset: RDD of [[LabeledPoint]].
    * @param validationInput Validation dataset.
    *                        This dataset should be different from the training 
dataset,
    *                        but it should follow the same distribution.
@@ -162,7 +162,7 @@ private[spark] object GradientBoostedTrees extends Logging {
    * Method to calculate error of the base learner for the gradient boosting 
calculation.
    * Note: This method is not used by the gradient boosting algorithm but is 
useful for debugging
    * purposes.
-   * @param data Training dataset: RDD of 
[[org.apache.spark.mllib.regression.LabeledPoint]].
+   * @param data Training dataset: RDD of [[LabeledPoint]].
    * @param trees Boosted Decision Tree models
    * @param treeWeights Learning rates at each boosting iteration.
    * @param loss evaluation metric.
@@ -184,7 +184,7 @@ private[spark] object GradientBoostedTrees extends Logging {
   /**
    * Method to compute error or loss for every iteration of gradient boosting.
    *
-   * @param data RDD of [[org.apache.spark.mllib.regression.LabeledPoint]]
+   * @param data RDD of [[LabeledPoint]]
    * @param trees Boosted Decision Tree models
    * @param treeWeights Learning rates at each boosting iteration.
    * @param loss evaluation metric.

http://git-wip-us.apache.org/repos/asf/spark/blob/536a2159/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala
----------------------------------------------------------------------
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala 
b/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala
index b504f41..8ae5ca3 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala
@@ -82,7 +82,7 @@ private[spark] object RandomForest extends Logging {
   /**
    * Train a random forest.
    *
-   * @param input Training data: RDD of 
[[org.apache.spark.mllib.regression.LabeledPoint]]
+   * @param input Training data: RDD of [[LabeledPoint]]
    * @return an unweighted set of trees
    */
   def run(
@@ -343,7 +343,7 @@ private[spark] object RandomForest extends Logging {
   /**
    * Given a group of nodes, this finds the best split for each node.
    *
-   * @param input Training data: RDD of 
[[org.apache.spark.ml.tree.impl.TreePoint]]
+   * @param input Training data: RDD of [[TreePoint]]
    * @param metadata Learning and dataset metadata
    * @param topNodesForGroup For each tree in group, tree index -> root node.
    *                         Used for matching instances with nodes.
@@ -854,10 +854,10 @@ private[spark] object RandomForest extends Logging {
    *       and for multiclass classification with a high-arity feature,
    *       there is one bin per category.
    *
-   * @param input Training data: RDD of 
[[org.apache.spark.mllib.regression.LabeledPoint]]
+   * @param input Training data: RDD of [[LabeledPoint]]
    * @param metadata Learning and dataset metadata
    * @param seed random seed
-   * @return Splits, an Array of [[org.apache.spark.mllib.tree.model.Split]]
+   * @return Splits, an Array of [[Split]]
    *          of size (numFeatures, numSplits)
    */
   protected[tree] def findSplits(


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to