The tfidf transformer is the slow part - I've done a number of speed tests
with scikit-learn classifiers, and adding tfidf always slowed things down
significantly. It's also didn't seem to help much with accuracy.

Jacob
---
http://streamhacker.com/
http://text-processing.com/
http://twitter.com/japerk


On Wed, Jan 16, 2013 at 12:38 AM, <
[email protected]> wrote:
>
> Date: Wed, 16 Jan 2013 14:00:12 +0530
> From: JAGANADH G <[email protected]>
> Subject: [Scikit-learn-general] Saved Classifier Model slow in
>         prediction
> To: [email protected]
> Message-ID:
>         <
> cam4qvdhfkx-dv242gnbccv4bnk5zgrk71pjd+vsz92rjuwa...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi All,
>
> I trained a LinierSVC() classifier for Text Classfication. Total training
> set is about 1 Lakh documents (mostly sinle line). The saved model is about
> 3.5 MB in size.
> When I used the model in my python script it takes too much time to preform
> the prediction (near to one min to predict took 20 minutes to classify 3000
> documents).
>
> My pipleline is
>  classifier = Pipeline([('vect',vectorizer),('tfidf',transformer), ('clf',
> LinearSVC())])
> Is there any way to make it faster.
> --
> **********************************
> JAGANADH G
> http://jaganadhg.in
> *ILUGCBE*
> http://ilugcbe.org.in
> -------------- next part --------------
> An HTML attachment was scrubbed...
>
> ------------------------------
>
> Message: 7
> Date: Wed, 16 Jan 2013 09:38:34 +0100
> From: Andreas Mueller <[email protected]>
> Subject: Re: [Scikit-learn-general] Saved Classifier Model slow in
>         prediction
> To: [email protected]
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 01/16/2013 09:30 AM, JAGANADH G wrote:
> > Hi All,
> > I trained a LinierSVC() classifier for Text Classfication. Total
> > training set is about 1 Lakh documents (mostly sinle line). The saved
> > model is about 3.5 MB in size.
> > When I used the model in my python script it takes too much time to
> > preform the prediction (near to one min to predict took 20 minutes to
> > classify 3000 documents).
> > My pipleline is
> > classifier = Pipeline([('vect',vectorizer),('tfidf',transformer),
> > ('clf',LinearSVC())])
> > Is there any way to make it faster.
> Probably.
> I'm a bit surprised that it took so long. I would imagine it is the
> vectorizer?
> Thought that should be O(tokens) afaik.
> Can you find out which of the steps in the pipline takes so long?
> Cheers,
> Andy
>
>
>
>
> ------------------------------
>
>
> ------------------------------------------------------------------------------
> Master Java SE, Java EE, Eclipse, Spring, Hibernate, JavaScript, jQuery
> and much more. Keep your Java skills current with LearnJavaNow -
> 200+ hours of step-by-step video tutorials by Java experts.
> SALE $49.99 this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122612
>
> ------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> End of Scikit-learn-general Digest, Vol 36, Issue 37
> ****************************************************
>
------------------------------------------------------------------------------
Master Java SE, Java EE, Eclipse, Spring, Hibernate, JavaScript, jQuery
and much more. Keep your Java skills current with LearnJavaNow -
200+ hours of step-by-step video tutorials by Java experts.
SALE $49.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122612 
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to