Hi,
well, it really depends on what you want to do ;)
TF-IDF is a measure that originates in the information retrieval context
and that can be used to judge the relevancy of a document in context of a
given search term.
It's also often used for text-related machine learning tasks. E.g. have a
Hi,
I read this page,
http://spark.apache.org/docs/1.2.0/mllib-feature-extraction.html. But I am
wondering, how to use this TF-IDF RDD? What is this TF-IDF vector looks
like?
Can someone provide me some guide?
Thanks,
[image: --]
Xi Shen
[image: http://]about.me/davidshen