[jira] [Updated] (SPARK-17094) provide simplified API for ML pipeline

yuhao yang (JIRA) Wed, 07 Sep 2016 11:09:47 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-17094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


yuhao yang updated SPARK-17094:
-------------------------------
    Description: 
Many machine learning pipeline has the API for easily assembling transformers.

One example would be:
val model = new Pipeline("tokenizer", "countvectorizer", "lda").fit(data).

Overall, the feature would 
1. Allow people (especially starters) to create a ML application in one simple 
line of code. 
2. And can be handy for users as they don't have to set the input, output 
columns.
3. Thinking further, we may not need code any longer to build a Spark ML 
application as it can be done by configuration:
{code}
"ml.pipeline": "tokenizer", "hashingTF", "lda"
"ml.tokenizer.toLowercase": "false"
...
{code}, which can be quite efficient for tuning on cluster.

Appreciate feedback and suggestions.

  was:
Many machine learning pipeline has the API for easily assembling transformers.

One example would be:
val model = new Pipeline("tokenizer", "countvectorizer", "lda").fit(data).

Appreciate feedback and suggestions.


> provide simplified API for ML pipeline
> --------------------------------------
>
>                 Key: SPARK-17094
>                 URL: https://issues.apache.org/jira/browse/SPARK-17094
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: yuhao yang
>
> Many machine learning pipeline has the API for easily assembling transformers.
> One example would be:
> val model = new Pipeline("tokenizer", "countvectorizer", "lda").fit(data).
> Overall, the feature would 
> 1. Allow people (especially starters) to create a ML application in one 
> simple line of code. 
> 2. And can be handy for users as they don't have to set the input, output 
> columns.
> 3. Thinking further, we may not need code any longer to build a Spark ML 
> application as it can be done by configuration:
> {code}
> "ml.pipeline": "tokenizer", "hashingTF", "lda"
> "ml.tokenizer.toLowercase": "false"
> ...
> {code}, which can be quite efficient for tuning on cluster.
> Appreciate feedback and suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17094) provide simplified API for ML pipeline

Reply via email to