[ https://issues.apache.org/jira/browse/SPARK-19797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892170#comment-15892170 ]
Zhe Sun commented on SPARK-19797: --------------------------------- A pull request was created https://github.com/apache/spark/pull/17137 > ML pipelines document error > --------------------------- > > Key: SPARK-19797 > URL: https://issues.apache.org/jira/browse/SPARK-19797 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.1.0 > Reporter: Zhe Sun > Priority: Trivial > Labels: documentation > Original Estimate: 5m > Remaining Estimate: 5m > > Description about pipeline in this paragraph is incorrect > https://spark.apache.org/docs/latest/ml-pipeline.html#how-it-works, which > misleads the user > bq. If the Pipeline had more *stages*, it would call the > LogisticRegressionModel’s transform() method on the DataFrame before passing > the DataFrame to the next stage. > The description is not accurate, because *Transformer* could also be a stage. > But only another Estimator will invoke an extra transform call. > So, the description should be corrected as: *If the Pipeline had more > _Estimators_*. > The code to prove it is here > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala#L160 -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org