Am 19.10.2012 14:13, schrieb [email protected]: > On Fri, Oct 19, 2012 at 8:01 AM, <[email protected]> wrote: >> On Fri, Oct 19, 2012 at 7:22 AM, Lars Buitinck <[email protected]> wrote: >>> 2012/10/19 Peter Prettenhofer <[email protected]>: >>>> BTW: Has anybody of your looked into patsy [1]? They have plenty of >>>> functionality for this kind of encodings (they call it treatment >>>> coding [2]). >>> It doesn't seem to use scipy.sparse, which for me would be a >>> requirement for a OneHotTransformer. >> This should be helpful >> http://mail.scipy.org/pipermail/scipy-user/2011-November/031092.html >> >> I have it somewhere, but I'm not sure where. > https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tools/grouputils.py#L121 > > added for something that is still unfinished, internal use only (no > argument checking) > Thanks for sharing. My implementation looks like this: def bins_to_binary(bins, max_inds): n_samples, n_features = bins.shape max_inds.insert(0, 0) add = np.cumsum(max_inds)[:-1] features = leaf_indices + add column_indices = features.ravel() row_indices = np.hstack([i * np.ones(n_features) for i in range(n_samples)]) data = np.ones(n_samples * n_features) return sparse.coo_matrix((data, (row_indices, column_indices)))
Maybe I'll wrap a sklearn estimator around it on the weekend. ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
