[ 
https://issues.apache.org/jira/browse/SPARK-6340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364246#comment-14364246
 ] 

Kian Ho edited comment on SPARK-6340 at 3/17/15 12:01 AM:
----------------------------------------------------------

Hi Joseph,

I initially considered that as a solution, however it was my understanding that 
you couldn't guarantee the same ordering between the instances pre- and post- 
transformations (since the transformations will be distributed across worker 
nodes). Hence, you may end up with features that will be zipped with labels 
they weren't originally assigned. Is this correct? This question was also 
mentioned by a couple of users in that thread.

Thanks


was (Author: kian.ho):
Hi Joseph,

I initially considered that as a solution, however it was my understanding that 
you couldn't guarantee the same ordering between the instances pre- and post- 
transformations (since the transformations will be distributed across worker 
nodes). Is this correct? This question was also mentioned by a couple of users 
in that thread.

Thanks

> mllib.IDF for LabelPoints
> -------------------------
>
>                 Key: SPARK-6340
>                 URL: https://issues.apache.org/jira/browse/SPARK-6340
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.3.0
>         Environment: python 2.7.8
> pyspark
> OS: Linux Mint 17 Qiana (Cinnamon 64-bit)
>            Reporter: Kian Ho
>            Priority: Minor
>              Labels: feature
>
> as per: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Using-TF-IDF-from-MLlib-td19429.html#a19528
> Having the IDF.fit accept LabelPoints would be useful since, correct me if 
> i'm wrong, there currently isn't a way of keeping track of which labels 
> belong to which documents if one needs to apply a conventional tf-idf 
> transformation on labelled text data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to