Hi,
I read this document,
http://spark.apache.org/docs/1.2.1/mllib-feature-extraction.html, and tried
to build a TF-IDF model of my documents.
I have a list of documents, each word is represented as a Int, and each
document is listed in one line.
doc_name, int1, int2...
doc_name, int3, int4...
Hey, I work it out myself :)
The Vector is actually a SparesVector, so when it is written into a
string, the format is
(size, [coordinate], [value...])
Simple!
On Sat, Mar 14, 2015 at 6:05 PM Xi Shen davidshe...@gmail.com wrote:
Hi,
I read this document,