So:
* There may not be an issue with RFE?
* You probably want to use Encoder rather than LabelEncoder in that example
* It seems as if the output of feature_indices_ needs to be understood as
if it is then masked by active_indices_, which only registers exactly those
features active in training. So the actual shape of the transformed output
of OneHotEncoder would fit your assumptions about feature indices...

I can't say I understand this functionality from the description of the
n_values='auto' setting: 'determine value range from training data', as
opposed to what it does: 'output features corresponding to values seen in
training'. (@Andy, is this right?) But what it does seems to fit what you
were hoping...?


On Tue, Jul 9, 2013 at 4:54 PM, Maheshakya Wijewardena <
[email protected]> wrote:

> Can you have look at this. This is what I have done. I think there is
> something going on with One hot encoder.
>
> from sklearn.datasets import make_classification
> from sklearn.feature_selection import RFE
>
>
> X, y = make_classification(n_samples=50, n_features=10, random_state=10)
> encoder = preprocessing.LabelEncoder()
> encoder.fit(X)
> X = encoder.transform(X)
> print X
> print X.shape
>
> encoder = preprocessing.OneHotEncoder()
> encoder.fit(X)
> X = encoder.transform(X)
>
>
> print encoder.feature_indices_
>
> estimator = SVR(kernel="linear")
> selector = RFE(estimator, 100, step=1)
> selector = selector.fit(X, y)
>
> After using label encoder on X I got an array of shape (50, 10) (which is
> obvious). But after doing One hot encoding, the feature indices I get are
> as follows.
>
> [   0  499  987 1487 1968 2459 2957 3401 3886 4379 4868]
>
>
> As far as I know, the maximum range of between 2 indices should be less than 
> or equal to number of rows, isn't it? which is 50. But here I got 500 instead 
> of 50. Have I gotten it wrong about One hot encoding or is there any other 
> issue with One hot encoding function?
>
>
>
> Thank you
>
>
>
>
> On Tue, Jul 9, 2013 at 10:09 AM, Joel Nothman <
> [email protected]> wrote:
>
>> *Hi *Maheshakya,
>>
>> It's hard to know what you mean by the rankings not making sense. What
>> parameters do you pass to the RFE constructor? In particular, what
>> underlying estimator are you using, and what step?
>>
>> I'll take a guess at the problem:
>> In each iteration, RFE discards the `step` worst features until
>> `n_features_to_select` features remain. Worst features is determined from
>> the magnitute of that feature's coefficients in the model learnt from the
>> remaining features at each step. If your estimator produces a sparse model
>> (e.g. L1 or strong L2 regularisation), many of your features are likely to
>> have equal 0 coefficients, and yet only an arbitrary selection of `step`
>> 0-coefficient features will be removed in each iteration. This is unlikely
>> to be a problem if you haven't one-hot-encoded your categorical data.
>>
>> Does that help?
>>
>> - Joel
>>
>>
>> On Tue, Jul 9, 2013 at 2:15 PM, Maheshakya Wijewardena <
>> [email protected]> wrote:
>>
>>> Hi,
>>> I'm trying to use Recursive Feature Elimination to a data set ( it's a
>>> very large matrix after performing one hot encoding).
>>> suppose One Hot encoded matrix is " X  "  (We have targets in y)
>>> rfe = RFE(some parameters)
>>> rfe.fit(X,y)
>>>
>>> After this I can get indices of selected features by rankings_  (or mask
>>> by support_)
>>>
>>> I want to know what are the values we get buy above when One hot encoded
>>> matrix is considered (as it is a binary sparse matrix). The indices I get
>>> from rankings_ (where ranking is 1 or mask is true) don't make any sense
>>> when One hot encoded data is matrix is considered.
>>> Can someone explain how to solve this?
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> See everything from the browser to the database with AppDynamics
>>> Get end-to-end visibility with application monitoring from AppDynamics
>>> Isolate bottlenecks and diagnose root cause in seconds.
>>> Start your free trial of AppDynamics Pro today!
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> See everything from the browser to the database with AppDynamics
>> Get end-to-end visibility with application monitoring from AppDynamics
>> Isolate bottlenecks and diagnose root cause in seconds.
>> Start your free trial of AppDynamics Pro today!
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to