[ https://issues.apache.org/jira/browse/SPARK-17094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15471366#comment-15471366 ]
yuhao yang commented on SPARK-17094: ------------------------------------ Thanks for the comment, Sean. The two questions were great. 1. For the configuration, it might be something like {code} pipeline("tokenizer").asInstanceOf[Tokenizer].set... pipeline(2).asInstanceOf[Tokenizer].set... {code} It will be great if there's a way to avoid the cast. Eventually, I think it would be great to have configuration support for ML transformers, thus we can do: {code} sc.set("ml.tokenizer.toLowercase", "false") {code} and configuration file support, which can avoid hard coding and provide great support for tuning on cluster. (Anyone like the idea? cc [~josephkb] [~mengxr]) 2. I'm thinking most users would only use linear pipeline. Could you please provide an example for non-linear pipelines? So we can have a specific discussion. I tried your code yet I cannot find a constructor for Pipeline like that. Is it something under development? And do we need to set the input column and output column for each stage? Overall, the feature would 1. Allow people (especially starters) to create a ML application in one simple line of code. 2. And can be handy for users as they don't have to set the input, output columns. 3. Thinking further, we may not need code any longer to build a Spark ML application as it can be done by configuration: {code} "ml.pipeline": "tokenizer", "hashingTF", "lda" "ml.tokenizer.toLowercase": "false" ... {code}. > provide simplified API for ML pipeline > -------------------------------------- > > Key: SPARK-17094 > URL: https://issues.apache.org/jira/browse/SPARK-17094 > Project: Spark > Issue Type: New Feature > Components: ML > Reporter: yuhao yang > > Many machine learning pipeline has the API for easily assembling transformers. > One example would be: > val model = new Pipeline("tokenizer", "countvectorizer", "lda").fit(data). > Appreciate feedback and suggestions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org