[ 
https://issues.apache.org/jira/browse/SPARK-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481342#comment-14481342
 ] 

Peter Rudenko edited comment on SPARK-3702 at 4/6/15 4:06 PM:
--------------------------------------------------------------

For trees based algorithms curious whether there would be performance benefit 
(assuming reimplementation of Decision tree) by passing directly Dataframe 
columns rather than single column with vector type. E.g.:

{code}
class GBT extends Estimator with HasInputCols

val model = new GBT.setInputCols("col1","col2", "col3, ...)
{code}

and split dataset using dataframe api.




was (Author: prudenko):
For trees based algorithms curious whether there would be performance benefit 
by passing directly Dataframe columns rather than single column with vector 
type. E.g.:

{code}
class GBT extends Estimator with HasInputCols

val model = new GBT.setInputCols("col1","col2", "col3, ...)
{code}





> Standardize MLlib classes for learners, models
> ----------------------------------------------
>
>                 Key: SPARK-3702
>                 URL: https://issues.apache.org/jira/browse/SPARK-3702
>             Project: Spark
>          Issue Type: Sub-task
>          Components: MLlib
>            Reporter: Joseph K. Bradley
>            Assignee: Joseph K. Bradley
>            Priority: Blocker
>
> Summary: Create a class hierarchy for learning algorithms and the models 
> those algorithms produce.
> This is a super-task of several sub-tasks (but JIRA does not allow subtasks 
> of subtasks).  See the "requires" links below for subtasks.
> Goals:
> * give intuitive structure to API, both for developers and for generated 
> documentation
> * support meta-algorithms (e.g., boosting)
> * support generic functionality (e.g., evaluation)
> * reduce code duplication across classes
> [Design doc for class hierarchy | 
> https://docs.google.com/document/d/1BH9el33kBX8JiDdgUJXdLW14CA2qhTCWIG46eXZVoJs]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to