Am 19.10.2012 14:13, schrieb [email protected]:
> On Fri, Oct 19, 2012 at 8:01 AM,  <[email protected]> wrote:
>> On Fri, Oct 19, 2012 at 7:22 AM, Lars Buitinck <[email protected]> wrote:
>>> 2012/10/19 Peter Prettenhofer <[email protected]>:
>>>> BTW: Has anybody of your looked into patsy [1]? They have plenty of
>>>> functionality for this kind of encodings (they call it treatment
>>>> coding [2]).
>>> It doesn't seem to use scipy.sparse, which for me would be a
>>> requirement for a OneHotTransformer.
>> This should be helpful
>> http://mail.scipy.org/pipermail/scipy-user/2011-November/031092.html
>>
>> I have it somewhere, but I'm not sure where.
> https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tools/grouputils.py#L121
>
> added for something that is still unfinished, internal use only (no
> argument checking)
>
Thanks for sharing.
My implementation looks like this:
def bins_to_binary(bins, max_inds):
     n_samples, n_features = bins.shape
     max_inds.insert(0, 0)
     add = np.cumsum(max_inds)[:-1]
     features = leaf_indices + add
     column_indices = features.ravel()
     row_indices = np.hstack([i * np.ones(n_features)
                                 for i in range(n_samples)])
     data = np.ones(n_samples * n_features)
     return sparse.coo_matrix((data, (row_indices, column_indices)))

Maybe I'll wrap a sklearn estimator around it on the weekend.


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to