[jira] [Commented] (SPARK-5566) Tokenizer for mllib package

2015-02-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315919#comment-14315919
 ] 

Apache Spark commented on SPARK-5566:
-

User 'aborsu985' has created a pull request for this issue:
https://github.com/apache/spark/pull/4504

> Tokenizer for mllib package
> ---
>
> Key: SPARK-5566
> URL: https://issues.apache.org/jira/browse/SPARK-5566
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>
> There exist tokenizer classes in the spark.ml.feature package and in the 
> LDAExample in the spark.examples.mllib package.  The Tokenizer in the 
> LDAExample is more advanced and should be made into a full-fledged public 
> class in spark.mllib.feature.  The spark.ml.feature.Tokenizer class should 
> become a wrapper around the new Tokenizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5566) Tokenizer for mllib package

2015-02-10 Thread Augustin Borsu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313996#comment-14313996
 ] 

Augustin Borsu commented on SPARK-5566:
---

We could use a tokenizer like this, but we would need to add regex and 
Array[String] parameters type to be able to change those aprameters in a 
crossvalidation.
https://github.com/apache/spark/pull/4504

> Tokenizer for mllib package
> ---
>
> Key: SPARK-5566
> URL: https://issues.apache.org/jira/browse/SPARK-5566
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>
> There exist tokenizer classes in the spark.ml.feature package and in the 
> LDAExample in the spark.examples.mllib package.  The Tokenizer in the 
> LDAExample is more advanced and should be made into a full-fledged public 
> class in spark.mllib.feature.  The spark.ml.feature.Tokenizer class should 
> become a wrapper around the new Tokenizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5566) Tokenizer for mllib package

2015-02-05 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308733#comment-14308733
 ] 

yuhao yang commented on SPARK-5566:
---

I mean only the underlying implementation. 

> Tokenizer for mllib package
> ---
>
> Key: SPARK-5566
> URL: https://issues.apache.org/jira/browse/SPARK-5566
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>
> There exist tokenizer classes in the spark.ml.feature package and in the 
> LDAExample in the spark.examples.mllib package.  The Tokenizer in the 
> LDAExample is more advanced and should be made into a full-fledged public 
> class in spark.mllib.feature.  The spark.ml.feature.Tokenizer class should 
> become a wrapper around the new Tokenizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5566) Tokenizer for mllib package

2015-02-04 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305588#comment-14305588
 ] 

Joseph K. Bradley commented on SPARK-5566:
--

Do you mean to share the underlying implementation or the public API?
It will be good if we can share some underlying code, but those various 
featurization methods are quite different and probably belong in different 
classes.  The APIs can be similar to the extent that all feature transformers 
should be similar.

> Tokenizer for mllib package
> ---
>
> Key: SPARK-5566
> URL: https://issues.apache.org/jira/browse/SPARK-5566
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>
> There exist tokenizer classes in the spark.ml.feature package and in the 
> LDAExample in the spark.examples.mllib package.  The Tokenizer in the 
> LDAExample is more advanced and should be made into a full-fledged public 
> class in spark.mllib.feature.  The spark.ml.feature.Tokenizer class should 
> become a wrapper around the new Tokenizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5566) Tokenizer for mllib package

2015-02-04 Thread yuhao yang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305172#comment-14305172
 ] 

yuhao yang commented on SPARK-5566:
---

Actually I believe many current code like Word2Vec and HashingTF share the 
similar data flow and it's best if we can take the common requirement into 
consideration. 

> Tokenizer for mllib package
> ---
>
> Key: SPARK-5566
> URL: https://issues.apache.org/jira/browse/SPARK-5566
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>
> There exist tokenizer classes in the spark.ml.feature package and in the 
> LDAExample in the spark.examples.mllib package.  The Tokenizer in the 
> LDAExample is more advanced and should be made into a full-fledged public 
> class in spark.mllib.feature.  The spark.ml.feature.Tokenizer class should 
> become a wrapper around the new Tokenizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org