The tfidf transformer is the slow part - I've done a number of speed tests with scikit-learn classifiers, and adding tfidf always slowed things down significantly. It's also didn't seem to help much with accuracy.
Jacob --- http://streamhacker.com/ http://text-processing.com/ http://twitter.com/japerk On Wed, Jan 16, 2013 at 12:38 AM, < [email protected]> wrote: > > Date: Wed, 16 Jan 2013 14:00:12 +0530 > From: JAGANADH G <[email protected]> > Subject: [Scikit-learn-general] Saved Classifier Model slow in > prediction > To: [email protected] > Message-ID: > < > cam4qvdhfkx-dv242gnbccv4bnk5zgrk71pjd+vsz92rjuwa...@mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > Hi All, > > I trained a LinierSVC() classifier for Text Classfication. Total training > set is about 1 Lakh documents (mostly sinle line). The saved model is about > 3.5 MB in size. > When I used the model in my python script it takes too much time to preform > the prediction (near to one min to predict took 20 minutes to classify 3000 > documents). > > My pipleline is > classifier = Pipeline([('vect',vectorizer),('tfidf',transformer), ('clf', > LinearSVC())]) > Is there any way to make it faster. > -- > ********************************** > JAGANADH G > http://jaganadhg.in > *ILUGCBE* > http://ilugcbe.org.in > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 7 > Date: Wed, 16 Jan 2013 09:38:34 +0100 > From: Andreas Mueller <[email protected]> > Subject: Re: [Scikit-learn-general] Saved Classifier Model slow in > prediction > To: [email protected] > Message-ID: <[email protected]> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > On 01/16/2013 09:30 AM, JAGANADH G wrote: > > Hi All, > > I trained a LinierSVC() classifier for Text Classfication. Total > > training set is about 1 Lakh documents (mostly sinle line). The saved > > model is about 3.5 MB in size. > > When I used the model in my python script it takes too much time to > > preform the prediction (near to one min to predict took 20 minutes to > > classify 3000 documents). > > My pipleline is > > classifier = Pipeline([('vect',vectorizer),('tfidf',transformer), > > ('clf',LinearSVC())]) > > Is there any way to make it faster. > Probably. > I'm a bit surprised that it took so long. I would imagine it is the > vectorizer? > Thought that should be O(tokens) afaik. > Can you find out which of the steps in the pipline takes so long? > Cheers, > Andy > > > > > ------------------------------ > > > ------------------------------------------------------------------------------ > Master Java SE, Java EE, Eclipse, Spring, Hibernate, JavaScript, jQuery > and much more. Keep your Java skills current with LearnJavaNow - > 200+ hours of step-by-step video tutorials by Java experts. > SALE $49.99 this month only -- learn more at: > http://p.sf.net/sfu/learnmore_122612 > > ------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > End of Scikit-learn-general Digest, Vol 36, Issue 37 > **************************************************** >
------------------------------------------------------------------------------ Master Java SE, Java EE, Eclipse, Spring, Hibernate, JavaScript, jQuery and much more. Keep your Java skills current with LearnJavaNow - 200+ hours of step-by-step video tutorials by Java experts. SALE $49.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122612
_______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
