[ 
https://issues.apache.org/jira/browse/SPARK-5114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14994296#comment-14994296
 ] 

Joseph K. Bradley commented on SPARK-5114:
------------------------------------------

[~srowen]  I agree we're too ambitious with setting targets.

[~mengxr] I think we've been delaying on several major design decision for 
Pipelines, like this JIRA.  It would be nice to decide some of these earlier 
rather than later, even if it means postponing feature JIRAs.

> Should Evaluator be a PipelineStage
> -----------------------------------
>
>                 Key: SPARK-5114
>                 URL: https://issues.apache.org/jira/browse/SPARK-5114
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>    Affects Versions: 1.2.0
>            Reporter: Joseph K. Bradley
>
> Pipelines can currently contain Estimators and Transformers.
> Question for debate: Should Pipelines be able to contain Evaluators?
> Pros:
> * Schema check: Evaluators take input datasets with particular schema, which 
> should perhaps be checked before running a Pipeline.
> * Intermediate results:
> ** If a Transformer removes a column (which is not done by built-in 
> Transformers currently but might be reasonable in the future), then the user 
> can never evaluate that column.  (However, users could keep all columns 
> around.)
> ** If users have to evaluate after running a Pipeline, then each evaluated 
> column may have to be re-materialized.
> Cons:
> * API: Evaluators do not transform datasets.   They produce a scalar (or a 
> few values), which makes it hard to say how they fit into a Pipeline or a 
> PipelineModel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to