Any of the algorithms implemented in scikit-learn can be incrementally
trained?
Three particular things are interesting to me: classifying texts,
unsupervised clustering analysis of texts and hierarchical clustering
analysis of texts. But my set of texts is just too big to load in memory
all at once even with a sparse representation. I can't train the classifier
or apply the clustering methods without having a MemoryError exception
thrown, even when working with a fraction of the texts (I tried the
Multinomial Naive Bayes, the Linear SVM and some of the clustering
algorithms).
Does anybody have any tips of what I can do before going all the way to
using things like Hadoop? Any of the algorithms can be trained
incrementally?
Thanks,
---
Rafael Calsaverini
Dep. de Física Geral, Sala 336
Instituto de Física - Universidade de São Paulo
[email protected]
http://stoa.usp.br/calsaverini/weblog
CEL: (11) 7525-6222
USP: (11) 3091-6803
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general