2013/8/22 Sean Violante <[email protected]>: > a) no problem with data copy: the executable loads data from file (you don't > need to keep in sklearn)
Quite the contrary. What if only raw data (text files, JSON, etc.) is on disk, and you still need to do feature extraction on it? Then you need a pipeline of a feature extraction script and a learner, so you're copying the raw data from disk into the feature extraction script, then into kernel buffers, and finally into the learning program. What about feature selection, is that an extra script with two additional copies? > b) most ML algos are available from command line with text file input. Python is a great tool for controlling external programs, but it's still a hard problem because usually the CLI interfaces to those programs are poorly defined. Error handling in particular can be very difficult and installation, deployment, and testing code must rewritten for each program. -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam ------------------------------------------------------------------------------ Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
