[ 
https://issues.apache.org/jira/browse/SPARK-19797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892149#comment-15892149
 ] 

Sean Owen commented on SPARK-19797:
-----------------------------------

I don't think that's true. The resulting pipeline would contain a 
LogisticRegressionModel, and when invoked, its transform() method would be 
called, and the result passed to subsequent transformers if any. You are 
pointing out that the trailing transformations aren't necessary to compute when 
_fitting_ the pipeline. That's what the if statement here is optimizing away.

> ML pipelines document error
> ---------------------------
>
>                 Key: SPARK-19797
>                 URL: https://issues.apache.org/jira/browse/SPARK-19797
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.1.0
>            Reporter: Zhe Sun
>            Priority: Trivial
>              Labels: documentation
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> Description about pipeline in this paragraph is incorrect 
> https://spark.apache.org/docs/latest/ml-pipeline.html#how-it-works, which 
> misleads the user
> bq. If the Pipeline had more *stages*, it would call the 
> LogisticRegressionModel’s transform() method on the DataFrame before passing 
> the DataFrame to the next stage.
> The description is not accurate, because *Transformer* could also be a stage. 
> But only another Estimator will invoke an extra transform call.
> So, the description should be corrected as: *If the Pipeline had more 
> _Estimators_*. 
> The code to prove it is here 
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala#L160



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to