Given (label, terms) you can just transform the values to a TF vector,
then TF-IDF vector, with HashingTF and IDF / IDFModel. Then you can
make a LabeledPoint from (label, vector) pairs. Is that what you're
looking for?

On Mon, Dec 29, 2014 at 3:37 AM, Yao <y...@ford.com> wrote:
> I found the TF-IDF feature extraction and all the MLlib code that work with
> pure Vector RDD very difficult to work with due to the lack of ability to
> associate vector back to the original data. Why can't Spark MLlib support
> LabeledPoint?
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Using-TF-IDF-from-MLlib-tp19429p20876.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to