2012/10/19 Andreas Mueller <[email protected]>: > I'd like to convert an array of integer categorial features to a sparse > indicator matrix. > So my data points look like > x =[ 100, 1, 5, 10] > These are indices for feature-bins which don't really have an ordering. > Therefore I want to convert them to a one-hot encoding per feature. > > What is the best way in sklearn to achieve this? This looks a bit > like the DictVectorizer, I think.
DictVectorizer is the only class we offer that does this, I think. (Unless you care to make strings out of your matrices and use CountVectorizer...). > If not, do you think this kind of encoding is common enough to > be included in sklearn? It's been requested on the ML over and over. At one point, I had a PR for a OneHotTransformer (https://github.com/scikit-learn/scikit-learn/pull/242) but that hasn't been updated in quite a while and I'm using DictVectorizer myself now. Feel free to pick it up if you need it, or start afresh. -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
