Here is what I did for this case : https://github.com/andypetrella/tf-idf

Le lun 29 déc. 2014 11:31, Sean Owen <so...@cloudera.com> a écrit :

> Given (label, terms) you can just transform the values to a TF vector,
> then TF-IDF vector, with HashingTF and IDF / IDFModel. Then you can
> make a LabeledPoint from (label, vector) pairs. Is that what you're
> looking for?
>
> On Mon, Dec 29, 2014 at 3:37 AM, Yao <y...@ford.com> wrote:
> > I found the TF-IDF feature extraction and all the MLlib code that work
> with
> > pure Vector RDD very difficult to work with due to the lack of ability to
> > associate vector back to the original data. Why can't Spark MLlib support
> > LabeledPoint?
> >
> >
> >
> > --
> > View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Using-TF-IDF-from-MLlib-tp19429p20876.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to