Hi Olivier ,

Here is the output as requested.
sklearn version - '0.12.1'
Python 2.7
Os : Ubuntu 11.04

Trace :
In [3]: from sklearn.datasets import load_files
In [4]: categ = ['pos','neg']
In [5]: dataset = load_files('data_n',categories=categ,shuffle=False)
In [6]: from sklearn.feature_extraction.text import CountVectorizer
In [7]: from sklearn.feature_extraction.text import TfidfTransformer
In [8]: %time X =
CountVectorizer(charset_error='ignore').fit_transform(dataset.data)
CPU times: user 16.68 s, sys: 0.99 s, total: 17.67 s
Wall time: 18.55 s
In [9]: print(repr(X))
<200000x44935 sparse matrix of type '<type 'numpy.int64'>'
        with 2254625 stored elements in COOrdinate format>
In [10]: %time X_tfidf = TfidfTransformer().fit_transform(X)
CPU times: user 0.96 s, sys: 0.06 s, total: 1.01 s
Wall time: 1.17 s
In [11]: print(repr(X_tfidf))
<200000x44935 sparse matrix of type '<type 'numpy.float64'>'
        with 2254625 stored elements in Compressed Sparse Row format>


The classifier which I saved gives the follwoing time

In [13]: from sklearn.externals import joblib
In [15]: from an import my_analyzer
In [16]: clf = joblib.load("Tw_SVM_LI_Bv_16_1_d.model")
In [17]: %time clf.predict(["This is a good sentence for debugging"])
CPU times: user 4.01 s, sys: 0.06 s, total: 4.06 s
Wall time: 4.08 s
Out[17]: array([1], dtype=int32)
In [18]: %time clf.predict(["This is avery good sign of improvement"])
CPU times: user 3.93 s, sys: 0.00 s, total: 3.93 s
Wall time: 3.94 s
Out[18]: array([1], dtype=int32)
In [19]: %time clf.predict(["This is avery good sign of improvement"])
CPU times: user 3.81 s, sys: 0.00 s, total: 3.81 s
Wall time: 3.80 s
Out[19]: array([1], dtype=int32)



Classifier without TF-IDF

In [21]: from an import tweet_analyzer
In [22]: nclf = joblib.load("SVC_CV_only.model")
In [23]: %time nclf.predict(["This is a good sentence for debugging"])
CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s
Wall time: 0.01 s
Out[23]: array([1], dtype=int32)
In [24]: %time nclf.predict(["This is avery good sign of improvement"])
CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s
Wall time: 0.01 s
Out[24]: array([1], dtype=int32)
In [25]: %time nclf.predict(["This is avery good sign of improvement"])
CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s
Wall time: 0.01 s
Out[25]: array([1], dtype=int32)

Hope this helps .

Best regards

-- 
**********************************
JAGANADH G
http://jaganadhg.in
*ILUGCBE*
http://ilugcbe.org.in
------------------------------------------------------------------------------
Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122812
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to