Can you have look at this. This is what I have done. I think there is
something going on with One hot encoder.
from sklearn.datasets import make_classification
from sklearn.feature_selection import RFE
X, y = make_classification(n_samples=50, n_features=10, random_state=10)
encoder = preprocessing.LabelEncoder()
encoder.fit(X)
X = encoder.transform(X)
print X
print X.shape
encoder = preprocessing.OneHotEncoder()
encoder.fit(X)
X = encoder.transform(X)
print encoder.feature_indices_
estimator = SVR(kernel="linear")
selector = RFE(estimator, 100, step=1)
selector = selector.fit(X, y)
After using label encoder on X I got an array of shape (50, 10) (which is
obvious). But after doing One hot encoding, the feature indices I get are
as follows.
[ 0 499 987 1487 1968 2459 2957 3401 3886 4379 4868]
As far as I know, the maximum range of between 2 indices should be
less than or equal to number of rows, isn't it? which is 50. But here
I got 500 instead of 50. Have I gotten it wrong about One hot encoding
or is there any other issue with One hot encoding function?
Thank you
On Tue, Jul 9, 2013 at 10:09 AM, Joel Nothman
<[email protected]>wrote:
> *Hi *Maheshakya,
>
> It's hard to know what you mean by the rankings not making sense. What
> parameters do you pass to the RFE constructor? In particular, what
> underlying estimator are you using, and what step?
>
> I'll take a guess at the problem:
> In each iteration, RFE discards the `step` worst features until
> `n_features_to_select` features remain. Worst features is determined from
> the magnitute of that feature's coefficients in the model learnt from the
> remaining features at each step. If your estimator produces a sparse model
> (e.g. L1 or strong L2 regularisation), many of your features are likely to
> have equal 0 coefficients, and yet only an arbitrary selection of `step`
> 0-coefficient features will be removed in each iteration. This is unlikely
> to be a problem if you haven't one-hot-encoded your categorical data.
>
> Does that help?
>
> - Joel
>
>
> On Tue, Jul 9, 2013 at 2:15 PM, Maheshakya Wijewardena <
> [email protected]> wrote:
>
>> Hi,
>> I'm trying to use Recursive Feature Elimination to a data set ( it's a
>> very large matrix after performing one hot encoding).
>> suppose One Hot encoded matrix is " X " (We have targets in y)
>> rfe = RFE(some parameters)
>> rfe.fit(X,y)
>>
>> After this I can get indices of selected features by rankings_ (or mask
>> by support_)
>>
>> I want to know what are the values we get buy above when One hot encoded
>> matrix is considered (as it is a binary sparse matrix). The indices I get
>> from rankings_ (where ranking is 1 or mask is true) don't make any sense
>> when One hot encoded data is matrix is considered.
>> Can someone explain how to solve this?
>>
>>
>>
>> ------------------------------------------------------------------------------
>> See everything from the browser to the database with AppDynamics
>> Get end-to-end visibility with application monitoring from AppDynamics
>> Isolate bottlenecks and diagnose root cause in seconds.
>> Start your free trial of AppDynamics Pro today!
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general