mani, You are sounding more and more like the poster child for an on-line classifier.
The idea would be that you would give your classified docs to the system first for testing, then again for incremental training. You can use the results of the test to adjust the learning rate for the incremental learning. See the work I have started with MAHOUT-228 for the beginnings of this. Let me know where it should go to help with your needs (i.e. what entry points that you would need). On Mon, Dec 28, 2009 at 1:33 PM, Mani Kumar <[email protected]>wrote: > lets talk about bigger numbers e.g. i have more than 1 million docs and i > get 10k new docs every day out of which 6k is already classified. > > Monitoring performance is good but it can be done weekly instead of daily > just to reduce cost. > > I actually wanted to avoid the retraining as much as possible because it > comes with huge cost for large dataset. > -- Ted Dunning, CTO DeepDyve
