Re: [Scikit-learn-general] Text processing using nltk, sklearn and pandas

Olivier Grisel Fri, 12 Jul 2013 10:02:24 -0700

2013/7/12 Lars Buitinck <[email protected]>:
> 2013/7/11 Tom Fawcett <[email protected]>:
>>> On Sun, Jul 7, 2013 at 6:58 AM, Joel Nothman <[email protected]> 
>>> wrote:
>>> (But I'm also not convinced that NLTK is the right tool for a lot of 
>>> large-scale feature extraction jobs.)
>>
>> I’m curious – why?
>
> I guess because it's terribly slow. I recently tried to cluster a
> sample of Wikipedia text at the word level. I found that about 75% of
> the time was spent in MiniBatchKMeans.fit, while the rest of it was
> spent inside nltk.word_tokenize (!)


That does not sound that bad to me.

--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Text processing using nltk, sklearn and pandas

Reply via email to