[
https://issues.apache.org/jira/browse/SPARK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313996#comment-14313996
]
Augustin Borsu edited comment on SPARK-5566 at 2/11/15 9:58 AM:
https://github.com/apache/spark/pull/4504
I propose a tokenizer loosely based on the NLTK regexTokenizer.
I didn't create a standalone tokenizer in mllib that I wrap in ml as I don't
think a standalone tokenizer is necessarly needed in mllib but if people
disagree I can change that.
was (Author: augustinb):
We could use a tokenizer like this, but we would need to add regex and
Array[String] parameters type to be able to change those aprameters in a
crossvalidation.
https://github.com/apache/spark/pull/4504
Tokenizer for mllib package
---
Key: SPARK-5566
URL: https://issues.apache.org/jira/browse/SPARK-5566
Project: Spark
Issue Type: New Feature
Components: ML, MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
There exist tokenizer classes in the spark.ml.feature package and in the
LDAExample in the spark.examples.mllib package. The Tokenizer in the
LDAExample is more advanced and should be made into a full-fledged public
class in spark.mllib.feature. The spark.ml.feature.Tokenizer class should
become a wrapper around the new Tokenizer.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org