[jira] [Commented] (SPARK-5566) Tokenizer for mllib package
[ https://issues.apache.org/jira/browse/SPARK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315919#comment-14315919 ] Apache Spark commented on SPARK-5566: - User 'aborsu985' has created a pull request for this issue: https://github.com/apache/spark/pull/4504 > Tokenizer for mllib package > --- > > Key: SPARK-5566 > URL: https://issues.apache.org/jira/browse/SPARK-5566 > Project: Spark > Issue Type: New Feature > Components: ML, MLlib >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley > > There exist tokenizer classes in the spark.ml.feature package and in the > LDAExample in the spark.examples.mllib package. The Tokenizer in the > LDAExample is more advanced and should be made into a full-fledged public > class in spark.mllib.feature. The spark.ml.feature.Tokenizer class should > become a wrapper around the new Tokenizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5566) Tokenizer for mllib package
[ https://issues.apache.org/jira/browse/SPARK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313996#comment-14313996 ] Augustin Borsu commented on SPARK-5566: --- We could use a tokenizer like this, but we would need to add regex and Array[String] parameters type to be able to change those aprameters in a crossvalidation. https://github.com/apache/spark/pull/4504 > Tokenizer for mllib package > --- > > Key: SPARK-5566 > URL: https://issues.apache.org/jira/browse/SPARK-5566 > Project: Spark > Issue Type: New Feature > Components: ML, MLlib >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley > > There exist tokenizer classes in the spark.ml.feature package and in the > LDAExample in the spark.examples.mllib package. The Tokenizer in the > LDAExample is more advanced and should be made into a full-fledged public > class in spark.mllib.feature. The spark.ml.feature.Tokenizer class should > become a wrapper around the new Tokenizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5566) Tokenizer for mllib package
[ https://issues.apache.org/jira/browse/SPARK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308733#comment-14308733 ] yuhao yang commented on SPARK-5566: --- I mean only the underlying implementation. > Tokenizer for mllib package > --- > > Key: SPARK-5566 > URL: https://issues.apache.org/jira/browse/SPARK-5566 > Project: Spark > Issue Type: New Feature > Components: ML, MLlib >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley > > There exist tokenizer classes in the spark.ml.feature package and in the > LDAExample in the spark.examples.mllib package. The Tokenizer in the > LDAExample is more advanced and should be made into a full-fledged public > class in spark.mllib.feature. The spark.ml.feature.Tokenizer class should > become a wrapper around the new Tokenizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5566) Tokenizer for mllib package
[ https://issues.apache.org/jira/browse/SPARK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305588#comment-14305588 ] Joseph K. Bradley commented on SPARK-5566: -- Do you mean to share the underlying implementation or the public API? It will be good if we can share some underlying code, but those various featurization methods are quite different and probably belong in different classes. The APIs can be similar to the extent that all feature transformers should be similar. > Tokenizer for mllib package > --- > > Key: SPARK-5566 > URL: https://issues.apache.org/jira/browse/SPARK-5566 > Project: Spark > Issue Type: New Feature > Components: ML, MLlib >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley > > There exist tokenizer classes in the spark.ml.feature package and in the > LDAExample in the spark.examples.mllib package. The Tokenizer in the > LDAExample is more advanced and should be made into a full-fledged public > class in spark.mllib.feature. The spark.ml.feature.Tokenizer class should > become a wrapper around the new Tokenizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5566) Tokenizer for mllib package
[ https://issues.apache.org/jira/browse/SPARK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305172#comment-14305172 ] yuhao yang commented on SPARK-5566: --- Actually I believe many current code like Word2Vec and HashingTF share the similar data flow and it's best if we can take the common requirement into consideration. > Tokenizer for mllib package > --- > > Key: SPARK-5566 > URL: https://issues.apache.org/jira/browse/SPARK-5566 > Project: Spark > Issue Type: New Feature > Components: ML, MLlib >Affects Versions: 1.3.0 >Reporter: Joseph K. Bradley > > There exist tokenizer classes in the spark.ml.feature package and in the > LDAExample in the spark.examples.mllib package. The Tokenizer in the > LDAExample is more advanced and should be made into a full-fledged public > class in spark.mllib.feature. The spark.ml.feature.Tokenizer class should > become a wrapper around the new Tokenizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org