Re: [scikit-learn] Why does sci-kit learn's hashingvectorizer give negative values?

Joel Nothman Sat, 01 Oct 2016 15:36:07 -0700

Negative values are not really there to compensate for hash collisions.
It's there because that makes the hashed vector space an approximation to
the full vector space under inner product.


On 2 October 2016 at 00:17, Roman Yurchak <[email protected]> wrote:

> On 01/10/16 15:34, Moyi Dang wrote:
> > However, I don't understand why the negatives are there in the first
> > place, or what they mean. I'm not sure if the absolute values are
> > corresponding to the token counts.
> >
> > Can someone please help explain what the HashingVectorizer is doing? How
> > do I get the HashingVectorizer to return token counts?
>
> Hi Moyi,
>
> it's a mechanism to compensate for hash collisions, see
> https://github.com/scikit-learn/scikit-learn/issues/7513 The absolute
> values are token counts for most practical applications (if you don't
> have too many collisions).  There will be a PR shortly to make this more
> consistent.
>
>
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
>

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Why does sci-kit learn's hashingvectorizer give negative values?

Reply via email to