On 01/10/16 15:34, Moyi Dang wrote: > However, I don't understand why the negatives are there in the first > place, or what they mean. I'm not sure if the absolute values are > corresponding to the token counts. > > Can someone please help explain what the HashingVectorizer is doing? How > do I get the HashingVectorizer to return token counts?
Hi Moyi, it's a mechanism to compensate for hash collisions, see https://github.com/scikit-learn/scikit-learn/issues/7513 The absolute values are token counts for most practical applications (if you don't have too many collisions). There will be a PR shortly to make this more consistent. _______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
