2013/7/12 Lars Buitinck <[email protected]>: > 2013/7/11 Tom Fawcett <[email protected]>: >>> On Sun, Jul 7, 2013 at 6:58 AM, Joel Nothman <[email protected]> >>> wrote: >>> (But I'm also not convinced that NLTK is the right tool for a lot of >>> large-scale feature extraction jobs.) >> >> I’m curious – why? > > I guess because it's terribly slow. I recently tried to cluster a > sample of Wikipedia text at the word level. I found that about 75% of > the time was spent in MiniBatchKMeans.fit, while the rest of it was > spent inside nltk.word_tokenize (!)
That does not sound that bad to me. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
