2014-02-27 23:37 GMT+01:00 Joel Nothman <joel.noth...@gmail.com>:
> I think it would be nice if the FeatureUnion makes it easy to extract
> only certain parts of the input for each transformer.
> https://github.com/scikit-learn/scikit-learn/issues/2034 intends to
> cover this issue, but we haven't resolved a clean API.
>
> Suggestions are welcome!

I hope you don't mind me replying here: I think this can be resolved
by custom transformers that pass through a user-specified set of
columns. My preferred way of implementing that would be a generic,
stateless transformer class that just runs a function on X in
transform and returns the result. If this transformer doesn't do input
validation, you could make a union

    make_pipeline(FunctionTransformer(extract_description_terms),
TfidfTransformer())
    ∪
    make_pipeline(FunctionTransformer(extract_portrait_pixels), PCA())

and feed this filenames, or dicts, or whatever. The original problem
of letting though only some columns is then

    def even_columns(X, *args):
        X = np.asarray(X)
        return X[:, ::2]

    FunctionTransformer(even_columns)

And of course, these things are more generally useful for inserting a
simple function in the middle of a pipeline.

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to