Hi Rafael,

Incremental learning is supported via ``partial_fit``, however, for
supervised learning only ``SGDClassifier`` [1] supports it (it should
be easy to add it to ``MultinomialNB`` too [2]).
For clustering you should have a look at ``MinibatchKMeans`` [3] it
supports ``partial_fit`` too - an example can be found here [4].

You should also consider Vowpal Wabbit [5]

best,
 Peter

[1] 
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html#sklearn.linear_model.SGDClassifier.partial_fit
[2] http://scikit-learn.org/stable/modules/naive_bayes.html
[3] 
http://scikit-learn.org/stable/modules/generated/sklearn.cluster.MiniBatchKMeans.html#sklearn.cluster.MiniBatchKMeans.partial_fit
[4] 
http://scikit-learn.org/stable/auto_examples/document_clustering.html#example-document-clustering-py
[5] http://hunch.net/~vw/

2012/5/11 Rafael Calsaverini <[email protected]>:
> Any of the algorithms implemented in scikit-learn can be incrementally
> trained?
>
> Three particular things are interesting to me: classifying texts,
> unsupervised clustering analysis of texts and hierarchical clustering
> analysis of texts. But my set of texts is just too big to load in memory all
> at once even with a sparse representation. I can't train the classifier or
> apply the clustering methods without having a MemoryError exception thrown,
> even when working with a fraction of the texts (I tried the Multinomial
> Naive Bayes, the Linear SVM and some of the clustering algorithms).
>
> Does anybody have any tips of what I can do before going all the way to
> using things like Hadoop? Any of the algorithms can be trained
> incrementally?
>
> Thanks,
> ---
> Rafael Calsaverini
> Dep. de Física Geral, Sala 336
> Instituto de Física - Universidade de São Paulo
>
> [email protected]
> http://stoa.usp.br/calsaverini/weblog
> CEL: (11) 7525-6222
> USP: (11) 3091-6803
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>



-- 
Peter Prettenhofer

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to