Re: [Scikit-learn-general] Text processing using nltk, sklearn and pandas

Fred Mailhot Fri, 12 Jul 2013 10:14:25 -0700

On 12 July 2013 09:48, Lars Buitinck <[email protected]> wrote:

> 2013/7/11 Tom Fawcett <[email protected]>:
> [...]
>
> I guess because it's terribly slow. I recently tried to cluster a
> sample of Wikipedia text at the word level.



What kind of results did you get? I did some work recently clustering
short-form text and was generally unimpressed with the results.

I found that about 75% of
> the time was spent in MiniBatchKMeans.fit, while the rest of it was
> spent inside nltk.word_tokenize (!)
>

How does that compare to naively using Python's split()?


>
> --
> Lars Buitinck
> Scientific programmer, ILPS
> University of Amsterdam
>
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Text processing using nltk, sklearn and pandas

Reply via email to