2014-02-27 23:37 GMT+01:00 Joel Nothman <joel.noth...@gmail.com>: > I think it would be nice if the FeatureUnion makes it easy to extract > only certain parts of the input for each transformer. > https://github.com/scikit-learn/scikit-learn/issues/2034 intends to > cover this issue, but we haven't resolved a clean API. > > Suggestions are welcome!
I hope you don't mind me replying here: I think this can be resolved by custom transformers that pass through a user-specified set of columns. My preferred way of implementing that would be a generic, stateless transformer class that just runs a function on X in transform and returns the result. If this transformer doesn't do input validation, you could make a union make_pipeline(FunctionTransformer(extract_description_terms), TfidfTransformer()) ∪ make_pipeline(FunctionTransformer(extract_portrait_pixels), PCA()) and feed this filenames, or dicts, or whatever. The original problem of letting though only some columns is then def even_columns(X, *args): X = np.asarray(X) return X[:, ::2] FunctionTransformer(even_columns) And of course, these things are more generally useful for inserting a simple function in the middle of a pipeline. ------------------------------------------------------------------------------ Flow-based real-time traffic analytics software. Cisco certified tool. Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer Customize your own dashboards, set traffic alerts and generate reports. Network behavioral analysis & security monitoring. All-in-one tool. http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general