Re: TF-IDF vector persistence with normalization enabled
>> "It seems to have tf-idf vectors later, you need to create tf vectors >> (DictionaryVectorizer.createTermFrequencyVectors) with logNormalize >> option set to false, and normPower option set to -1.0f." > That post implies that in order to have tf-idf vectors persisted, in the tf > vectors creation phase you need those options set. I've noticed that from playing around with DictionaryVectorizer and TFIDFConverter. I'm just wondering why this is the case. I don't understand the reasoning behind the vectors not persisting when normalization is enabled.
Re: TF-IDF vector persistence with normalization enabled
That post implies that in order to have tf-idf vectors persisted, in the tf vectors creation phase you need those options set. Or you can always run the Driver directly and easily, preferably from mahout's commandline, i.e. bin/mahout seq2sparse Gokhan On Tue, Jun 3, 2014 at 9:37 AM, David Noel wrote: > I made an observation similar to what was pointed out in this mailing > list post here: > http://comments.gmane.org/gmane.comp.apache.mahout.user/17819; that > TF-IDF vectors do not seem to persist when generating them with > normalization enabled. > > According to Gokhan Capan: > > "It seems to have tf-idf vectors later, you need to create tf vectors > (DictionaryVectorizer.createTermFrequencyVectors) with logNormalize option > set to false, and normPower option set to -1.0f." > > Is there some reason for this? It would seem useful if they persisted. > Can someone explain the reasoning behind them not? I figure there's a > perfectly good reason, I just can't seem to figure out what it is. >
TF-IDF vector persistence with normalization enabled
I made an observation similar to what was pointed out in this mailing list post here: http://comments.gmane.org/gmane.comp.apache.mahout.user/17819; that TF-IDF vectors do not seem to persist when generating them with normalization enabled. According to Gokhan Capan: "It seems to have tf-idf vectors later, you need to create tf vectors (DictionaryVectorizer.createTermFrequencyVectors) with logNormalize option set to false, and normPower option set to -1.0f." Is there some reason for this? It would seem useful if they persisted. Can someone explain the reasoning behind them not? I figure there's a perfectly good reason, I just can't seem to figure out what it is.