Please help me understand TF-IDF Vector structure

2015-03-14 Thread Xi Shen
Hi, I read this document, http://spark.apache.org/docs/1.2.1/mllib-feature-extraction.html, and tried to build a TF-IDF model of my documents. I have a list of documents, each word is represented as a Int, and each document is listed in one line. doc_name, int1, int2... doc_name, int3, int4...

Re: Please help me understand TF-IDF Vector structure

2015-03-14 Thread Xi Shen
Hey, I work it out myself :) The Vector is actually a SparesVector, so when it is written into a string, the format is (size, [coordinate], [value...]) Simple! On Sat, Mar 14, 2015 at 6:05 PM Xi Shen davidshe...@gmail.com wrote: Hi, I read this document,