[jira] [Comment Edited] (SPARK-5566) Tokenizer for mllib package

2015-02-11 Thread Augustin Borsu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313996#comment-14313996
 ] 

Augustin Borsu edited comment on SPARK-5566 at 2/11/15 9:58 AM:


https://github.com/apache/spark/pull/4504
I propose a tokenizer loosely based on the NLTK regexTokenizer.
I didn't create a standalone tokenizer in mllib that I wrap in ml as I don't 
think a standalone tokenizer is necessarly needed in mllib but if people 
disagree I can change that.


was (Author: augustinb):
We could use a tokenizer like this, but we would need to add regex and 
Array[String] parameters type to be able to change those aprameters in a 
crossvalidation.
https://github.com/apache/spark/pull/4504

 Tokenizer for mllib package
 ---

 Key: SPARK-5566
 URL: https://issues.apache.org/jira/browse/SPARK-5566
 Project: Spark
  Issue Type: New Feature
  Components: ML, MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley

 There exist tokenizer classes in the spark.ml.feature package and in the 
 LDAExample in the spark.examples.mllib package.  The Tokenizer in the 
 LDAExample is more advanced and should be made into a full-fledged public 
 class in spark.mllib.feature.  The spark.ml.feature.Tokenizer class should 
 become a wrapper around the new Tokenizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5566) Tokenizer for mllib package

2015-02-10 Thread Augustin Borsu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313996#comment-14313996
 ] 

Augustin Borsu commented on SPARK-5566:
---

We could use a tokenizer like this, but we would need to add regex and 
Array[String] parameters type to be able to change those aprameters in a 
crossvalidation.
https://github.com/apache/spark/pull/4504

 Tokenizer for mllib package
 ---

 Key: SPARK-5566
 URL: https://issues.apache.org/jira/browse/SPARK-5566
 Project: Spark
  Issue Type: New Feature
  Components: ML, MLlib
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley

 There exist tokenizer classes in the spark.ml.feature package and in the 
 LDAExample in the spark.examples.mllib package.  The Tokenizer in the 
 LDAExample is more advanced and should be made into a full-fledged public 
 class in spark.mllib.feature.  The spark.ml.feature.Tokenizer class should 
 become a wrapper around the new Tokenizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org