well, it really depends on what you want to do ;)

TF-IDF is a measure that originates in the information retrieval context
and that can be used to judge the relevancy of a document in context of a
given search term.

It's also often used for text-related machine learning tasks. E.g. have a
look at topic extraction using non-negative matrix factorization.


2015-03-09 7:39 GMT+01:00 Xi Shen <davidshe...@gmail.com>:

> Hi,
> I read this page,
> http://spark.apache.org/docs/1.2.0/mllib-feature-extraction.html. But I
> am wondering, how to use this TF-IDF RDD? What is this TF-IDF vector looks
> like?
> Can someone provide me some guide?
> Thanks,
> [image: --]
> Xi Shen
> [image: http://]about.me/davidshen
> <http://about.me/davidshen?promo=email_sig>
>   <http://about.me/davidshen>

Reply via email to