[MLlib] Term Frequency in TF-IDF seems incorrect

Hao Ren Mon, 01 Aug 2016 15:29:46 -0700

When computing term frequency, we can use either HashTF or CountVectorizer
feature extractors.
However, both of them just use the number of times that a term appears in a
document.
It is not a true frequency. Acutally, it should be divided by the length of
the document.


Is this a wanted feature ?

-- 
Hao Ren

Data Engineer @ leboncoin

Paris, France

[MLlib] Term Frequency in TF-IDF seems incorrect

Reply via email to