Negative values are not really there to compensate for hash collisions.
It's there because that makes the hashed vector space an approximation to
the full vector space under inner product.

On 2 October 2016 at 00:17, Roman Yurchak <[email protected]> wrote:

> On 01/10/16 15:34, Moyi Dang wrote:
> > However, I don't understand why the negatives are there in the first
> > place, or what they mean. I'm not sure if the absolute values are
> > corresponding to the token counts.
> >
> > Can someone please help explain what the HashingVectorizer is doing? How
> > do I get the HashingVectorizer to return token counts?
>
> Hi Moyi,
>
> it's a mechanism to compensate for hash collisions, see
> https://github.com/scikit-learn/scikit-learn/issues/7513 The absolute
> values are token counts for most practical applications (if you don't
> have too many collisions).  There will be a PR shortly to make this more
> consistent.
>
>
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
>
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to