[ https://issues.apache.org/jira/browse/SPARK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305172#comment-14305172 ]
yuhao yang commented on SPARK-5566: ----------------------------------- Actually I believe many current code like Word2Vec and HashingTF share the similar data flow and it's best if we can take the common requirement into consideration. > Tokenizer for mllib package > --------------------------- > > Key: SPARK-5566 > URL: https://issues.apache.org/jira/browse/SPARK-5566 > Project: Spark > Issue Type: New Feature > Components: ML, MLlib > Affects Versions: 1.3.0 > Reporter: Joseph K. Bradley > > There exist tokenizer classes in the spark.ml.feature package and in the > LDAExample in the spark.examples.mllib package. The Tokenizer in the > LDAExample is more advanced and should be made into a full-fledged public > class in spark.mllib.feature. The spark.ml.feature.Tokenizer class should > become a wrapper around the new Tokenizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org