spark git commit: [SPARK-19797][DOC] ML pipeline document correction
Repository: spark Updated Branches: refs/heads/master fa50143cd -> 0bac3e4cd [SPARK-19797][DOC] ML pipeline document correction ## What changes were proposed in this pull request? Description about pipeline in this paragraph is incorrect https://spark.apache.org/docs/latest/ml-pipeline.html#how-it-works > If the Pipeline had more **stages**, it would call the > LogisticRegressionModelâs transform() method on the DataFrame before > passing the DataFrame to the next stage. Reason: Transformer could also be a stage. But only another Estimator will invoke an transform call and pass the data to next stage. The description in the document misleads ML pipeline users. ## How was this patch tested? This is a tiny modification of **docs/ml-pipelines.md**. I jekyll build the modification and check the compiled document. Author: Zhe Sun Closes #17137 from ymwdalex/SPARK-19797-ML-pipeline-document-correction. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0bac3e4c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0bac3e4c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0bac3e4c Branch: refs/heads/master Commit: 0bac3e4cde75678beac02e67b8873fe779e9ad34 Parents: fa50143 Author: Zhe Sun Authored: Fri Mar 3 11:55:57 2017 +0100 Committer: Sean Owen Committed: Fri Mar 3 11:55:57 2017 +0100 -- docs/ml-pipeline.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0bac3e4c/docs/ml-pipeline.md -- diff --git a/docs/ml-pipeline.md b/docs/ml-pipeline.md index 7cbb146..aa92c0a 100644 --- a/docs/ml-pipeline.md +++ b/docs/ml-pipeline.md @@ -132,7 +132,7 @@ The `Pipeline.fit()` method is called on the original `DataFrame`, which has raw The `Tokenizer.transform()` method splits the raw text documents into words, adding a new column with words to the `DataFrame`. The `HashingTF.transform()` method converts the words column into feature vectors, adding a new column with those vectors to the `DataFrame`. Now, since `LogisticRegression` is an `Estimator`, the `Pipeline` first calls `LogisticRegression.fit()` to produce a `LogisticRegressionModel`. -If the `Pipeline` had more stages, it would call the `LogisticRegressionModel`'s `transform()` +If the `Pipeline` had more `Estimator`s, it would call the `LogisticRegressionModel`'s `transform()` method on the `DataFrame` before passing the `DataFrame` to the next stage. A `Pipeline` is an `Estimator`. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-19797][DOC] ML pipeline document correction
Repository: spark Updated Branches: refs/heads/branch-2.1 1237aaea2 -> accbed7c2 [SPARK-19797][DOC] ML pipeline document correction ## What changes were proposed in this pull request? Description about pipeline in this paragraph is incorrect https://spark.apache.org/docs/latest/ml-pipeline.html#how-it-works > If the Pipeline had more **stages**, it would call the > LogisticRegressionModelâs transform() method on the DataFrame before > passing the DataFrame to the next stage. Reason: Transformer could also be a stage. But only another Estimator will invoke an transform call and pass the data to next stage. The description in the document misleads ML pipeline users. ## How was this patch tested? This is a tiny modification of **docs/ml-pipelines.md**. I jekyll build the modification and check the compiled document. Author: Zhe Sun Closes #17137 from ymwdalex/SPARK-19797-ML-pipeline-document-correction. (cherry picked from commit 0bac3e4cde75678beac02e67b8873fe779e9ad34) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/accbed7c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/accbed7c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/accbed7c Branch: refs/heads/branch-2.1 Commit: accbed7c2cfbe46fa6f55e97241b617c6ad4431f Parents: 1237aae Author: Zhe Sun Authored: Fri Mar 3 11:55:57 2017 +0100 Committer: Sean Owen Committed: Fri Mar 3 11:56:07 2017 +0100 -- docs/ml-pipeline.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/accbed7c/docs/ml-pipeline.md -- diff --git a/docs/ml-pipeline.md b/docs/ml-pipeline.md index 7cbb146..aa92c0a 100644 --- a/docs/ml-pipeline.md +++ b/docs/ml-pipeline.md @@ -132,7 +132,7 @@ The `Pipeline.fit()` method is called on the original `DataFrame`, which has raw The `Tokenizer.transform()` method splits the raw text documents into words, adding a new column with words to the `DataFrame`. The `HashingTF.transform()` method converts the words column into feature vectors, adding a new column with those vectors to the `DataFrame`. Now, since `LogisticRegression` is an `Estimator`, the `Pipeline` first calls `LogisticRegression.fit()` to produce a `LogisticRegressionModel`. -If the `Pipeline` had more stages, it would call the `LogisticRegressionModel`'s `transform()` +If the `Pipeline` had more `Estimator`s, it would call the `LogisticRegressionModel`'s `transform()` method on the `DataFrame` before passing the `DataFrame` to the next stage. A `Pipeline` is an `Estimator`. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org