Make the maven test phase download this dataset once for all tests ? Is that possible
On Tue, Feb 9, 2010 at 7:43 PM, Sean <sro...@gmail.com> wrote: > I don't, but can offer alternatives -- > > Just have the user download the data set. I don't think this is a big > burden. > Download the data set automatically. > > These are free of legal and tarball-size problems. > > On Tue, Feb 9, 2010 at 2:11 PM, Robin Anil <robin.a...@gmail.com> wrote: > > I feel a need to check in a set of text documents to mahout. maybe 3-4 > > categories of documents 10 each. > > can be used in clustering classification, vectorizer collocation testing > and > > even frequent pattern generation > > > > And instead doing artificial tests each of it can use this to test > against a > > reference implementation written in the testclass like what kmeans does. > > > > Plus we will have a baseline with which we can see improvements in these > > algorithms. Any idea of some good(legally sound :)) dataset which we can > > use? > > > > Same idea can be extended to CF also > > > > > > Robin > > >