2012/10/19 Andreas Mueller <[email protected]>:
> I'd like to convert an array of integer categorial features to a sparse
> indicator matrix.
> So my data points look like
> x =[ 100, 1, 5, 10]
> These are indices for feature-bins which don't really have an ordering.
> Therefore I want to convert them to a one-hot encoding per feature.
>
> What is the best way in sklearn to achieve this? This looks a bit
> like the DictVectorizer, I think.

DictVectorizer is the only class we offer that does this, I think.
(Unless you care to make strings out of your matrices and use
CountVectorizer...).

> If not, do you think this kind of encoding is common enough to
> be included in sklearn?

It's been requested on the ML over and over. At one point, I had a PR
for a OneHotTransformer
(https://github.com/scikit-learn/scikit-learn/pull/242) but that
hasn't been updated in quite a while and I'm using DictVectorizer
myself now. Feel free to pick it up if you need it, or start afresh.

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to