Hi Sklearners,

I was trying out several feature selection methods of sklearn on the Arcene
dataset [1] and it occurred to me that despite the numerous examples [2] in
the docs, most of them were just plotting/printing most relevant features.

What is missing IMHO is a simple example on how to actually transform the
dataset after the initial feature selection !

I'm thinking something really simple but that I couldn't find anywhere like:

"""
clf = GradientBoostingClassifier()
clf.fit(X,y)

feats_mask = [ i > 1e-3 for i in clf.feature_importances_ ]
X = X.compress(feats_mask, axis=1)
clf.fit(X,y) # again, since we now operate only on selected features
"""

I think for new users such numpy array techniques could be a bit of a pain
to find and being a user friendly project we should incorporate such simple
techniques.

If you agree, what should I do :
- make a PR with a new example, perhaps more practically oriented ?
- or append a sample code (like X = X.compress(mask)) to the feature
selection narrative docs ?

I personally prefer the later option.

NB: the document classification example has some sample code, but just for
Chi2/SelectKBest
http://scikit-learn.org/dev/auto_examples/document_classification_20newsgroups.html

NB2: I dunno if X.compress is recommended on any kind of sparse matrices ?

Eustache

[1] http://archive.ics.uci.edu/ml/datasets/Arcene
[2] http://scikit-learn.org/dev/modules/feature_selection.html
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to