Hello,
   I am dumping the dataset vectorized with TfidfVectorizer, target array, and
 the classifier OneVsRestClassifierSGDClassifier(loss=log, n_iter=50, 
alpha=0.00001)), since I want to add it to a package. I use joblib library from
 sklearn.externals to dump the vectors. The max memory used when training the
 classifier is 12g, however, when the program starts dumping classifier the 
usage
 jumps to 38g (which I assume is due to some internal copy?). I have about 32g 
of
 RAM, so is there a better way to store the classifier instead of using 
 joblib.dump(compress=9)? [I tried values compress=3, 5, 7, 9, always get memory
 error]. If I do not compress the vectors total to about 11g.
Thanks 


------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to