[ 
https://issues.apache.org/jira/browse/SPARK-5874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14367309#comment-14367309
 ] 

Xiangrui Meng commented on SPARK-5874:
--------------------------------------

[~Elie A.] Thanks for your feedback! This JIRA is to discuss the pipeline API 
but not specific components. For text preprocessing, we will certainly add 
standard stemmer and stop words filter as transformers. There is also a 
RegexTokenizer in review: https://github.com/apache/spark/pull/4504



> How to improve the current ML pipeline API?
> -------------------------------------------
>
>                 Key: SPARK-5874
>                 URL: https://issues.apache.org/jira/browse/SPARK-5874
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: ML
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>            Priority: Critical
>
> I created this JIRA to collect feedbacks about the ML pipeline API we 
> introduced in Spark 1.2. The target is to graduate this set of APIs in 1.4 
> with confidence, which requires valuable input from the community. I'll 
> create sub-tasks for each major issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to