Behind some of the discussion at
https://github.com/scikit-learn/scikit-learn/issues/1963 is an underlying
issue of how to deal with transform methods that require a parameter other
than X.
I notice that there are some remnants of a former scikit-learn that allowed
parameters to fit(...) other than X and y, but these have been removed in
favour of object-level parameters.
Should the same be the case with transform? In particular, Pipeline and
FeatureUnion only support 1-ary transform, and any parameter that can only
be set as an additional argument to transform can't be varied in a [grid]
search. That is, >1-ary transform seems misfit to the API.
So what non-X arguments to transform methods take?
- copy=True in various, such as feature_extraction.text.TfidfTransformer
- y=None in various
- ridge_alpha=None in decomposition.sparse_pca.SparsePCA
- threshold=None in feature_selection.selector_mixin.SelectorMixin
- pooling_func=np.mean
in cluster._feature_agglomeration.AgglomerationTransform
- Y=None in pls._PLS, pls.PLSSVD
copy is not really a problem. It's not something you want to vary in
parameter search, and although perhaps the pipeline could take advantage of
it, it's no problem.
y is the subject of the first issue I mentioned.
In the case of ridge_alpha, it looks like this argument should be
deprecated (or left in as harmless?) as the code already backs off to an
object property.
For threshold, this additional parameter also makes it harder to adopt the
feature selection mixin. I propose that instead such estimators should have
a to_selector/to_transformer method that returns a feature selector where a
threshold parameter (or a lower and upper and limit threshold; see
https://github.com/scikit-learn/scikit-learn/pull/1939) can be played with
directly through the common scikit-learn parameters API (and therefore
[grid] search).
I haven't tried to understand pooling_func or Y so I don't get whether
they're necessary.
Thoughts?
- Joel
------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general