On 01/10/16 15:34, Moyi Dang wrote:
> However, I don't understand why the negatives are there in the first
> place, or what they mean. I'm not sure if the absolute values are
> corresponding to the token counts.
> 
> Can someone please help explain what the HashingVectorizer is doing? How
> do I get the HashingVectorizer to return token counts?

Hi Moyi,

it's a mechanism to compensate for hash collisions, see
https://github.com/scikit-learn/scikit-learn/issues/7513 The absolute
values are token counts for most practical applications (if you don't
have too many collisions).  There will be a PR shortly to make this more
consistent.


_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to