2013/8/22 Sean Violante <[email protected]>:
> a) no problem with data copy: the executable loads data from file (you don't
> need to keep in sklearn)

Quite the contrary. What if only raw data (text files, JSON, etc.) is
on disk, and you still need to do feature extraction on it? Then you
need a pipeline of a feature extraction script and a learner, so
you're copying the raw data from disk into the feature extraction
script, then into kernel buffers, and finally into the learning
program. What about feature selection, is that an extra script with
two additional copies?

> b) most ML algos are available from command line with text file input.

Python is a great tool for controlling external programs, but it's
still a hard problem because usually the CLI interfaces to those
programs are poorly defined. Error handling in particular can be very
difficult and installation, deployment, and testing code must
rewritten for each program.

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to