Training can be of arbitary size. No limits. Classification needs to load data into memory and therefore you are limited there. You can prune low frequency words to greatly reduce the model size without affecting precision much
Robin On Mon, Aug 30, 2010 at 1:01 PM, Ted Dunning <[email protected]> wrote: > With Naive Bayes, you should be able to train with a nearly arbitrarily > large data set. The only limit will be keeping a list of the unique words > in memory. > > On Mon, Aug 30, 2010 at 12:21 AM, jun li <[email protected]> wrote: > > > I ever train a naive bayes classifier by a large training size. like > > dmoz , using lingpipe package. > > but out of memory. i.e., exceed limit of java heap size. > > > > I want to know does any one tried a big training size to train a > > mahout bayes classifier for text ? > > thanks. > > > > > > -- > > Li Jun > > >
