[jira] [Created] (SPARK-18213) Syntactic sugar over Pipeline API

Wojciech Szymanski (JIRA) Tue, 01 Nov 2016 16:20:52 -0700

Wojciech Szymanski created SPARK-18213:
------------------------------------------


             Summary: Syntactic sugar over Pipeline API
                 Key: SPARK-18213
                 URL: https://issues.apache.org/jira/browse/SPARK-18213
             Project: Spark
          Issue Type: Improvement
          Components: ML
    Affects Versions: 2.0.1
            Reporter: Wojciech Szymanski
            Priority: Minor


Currently, creating ML Pipeline is based on very verbose setStages method as 
below:
{code}
    val tokenizer = new RegexTokenizer()
    val stopWordsRemover = new StopWordsRemover()
    val countVectorizer = new CountVectorizer()

    val pipeline = new Pipeline().setStages(Array(tokenizer, stopWordsRemover, 
countVectorizer))
{code}

What about a bit of syntactic sugar over Pipeline API?
{code}
    val tokenizer = new RegexTokenizer()
    val stopWordsRemover = new StopWordsRemover()
    val countVectorizer = new CountVectorizer()

    val pipeline = tokenizer + stopWordsRemover + countVectorizer
{code}

Production code changes in 
mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala:
https://github.com/apache/spark/commit/181df64bf50081f3af5a84b567b677178c88524f#diff-5226e84dea43423760dc6300ddafb01b

Scala example:
https://github.com/apache/spark/commit/181df64bf50081f3af5a84b567b677178c88524f#diff-798e85dd9107565fabab1126f57e3d6e

Java example:
https://github.com/apache/spark/commit/181df64bf50081f3af5a84b567b677178c88524f#diff-69ac857220f21b5e1684444d80d6dffe

Thanks in advance for your feedback.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-18213) Syntactic sugar over Pipeline API

Reply via email to