Training can be of arbitary size. No limits. Classification needs to load
data into memory and therefore you are limited there. You can prune low
frequency words to greatly reduce the model size without affecting precision
much

Robin

On Mon, Aug 30, 2010 at 1:01 PM, Ted Dunning <[email protected]> wrote:

> With Naive Bayes, you should be able to train with a nearly arbitrarily
> large data set.  The only limit will be keeping a list of the unique words
> in memory.
>
> On Mon, Aug 30, 2010 at 12:21 AM, jun li <[email protected]> wrote:
>
> > I ever train a naive bayes classifier by a large training size. like
> > dmoz , using lingpipe package.
> > but out of memory. i.e., exceed limit of java heap size.
> >
> > I want to know does any one tried a big training size to train a
> > mahout bayes classifier  for text ?
> > thanks.
> >
> >
> > --
> > Li Jun
> >
>

Reply via email to