[ https://issues.apache.org/jira/browse/SPARK-18213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15628333#comment-15628333 ]
Sean Owen commented on SPARK-18213: ----------------------------------- I personally don't think this adds much; it's not appreciably clearer. I am also wary of using operator overloads. > Syntactic sugar over Pipeline API > --------------------------------- > > Key: SPARK-18213 > URL: https://issues.apache.org/jira/browse/SPARK-18213 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 2.0.1 > Reporter: Wojciech Szymanski > Priority: Minor > > Currently, creating ML Pipeline is based on very verbose setStages method as > below: > {code} > val tokenizer = new RegexTokenizer() > val stopWordsRemover = new StopWordsRemover() > val countVectorizer = new CountVectorizer() > val pipeline = new Pipeline().setStages(Array(tokenizer, > stopWordsRemover, countVectorizer)) > {code} > What about a bit of syntactic sugar over Pipeline API? > {code} > val tokenizer = new RegexTokenizer() > val stopWordsRemover = new StopWordsRemover() > val countVectorizer = new CountVectorizer() > val pipeline = tokenizer + stopWordsRemover + countVectorizer > {code} > Production code changes in > mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala: > https://github.com/apache/spark/commit/181df64bf50081f3af5a84b567b677178c88524f#diff-5226e84dea43423760dc6300ddafb01b > Scala example: > https://github.com/apache/spark/commit/181df64bf50081f3af5a84b567b677178c88524f#diff-798e85dd9107565fabab1126f57e3d6e > Java example: > https://github.com/apache/spark/commit/181df64bf50081f3af5a84b567b677178c88524f#diff-69ac857220f21b5e1684444d80d6dffe > Thanks in advance for your feedback. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org