Take a look at this repo http://fimi.cs.helsinki.fi/data/ I am specifically talking about the retail and accidents dataset. A modified version of them(comma separated) is being used by me for FPGrowth testing. Webdocs dataset looks good enough for being used for parallel fpgrowth testing.
Question is shall i use the url to fetch them , then convert to the required format. Or keep the converted format in a repo like in people.apache.org/~robinanil/datasets/ or something dedicated for mahout. On Thu, Oct 8, 2009 at 5:39 PM, Robin Anil <robin.a...@gmail.com> wrote: > We need a central place for all sample datasets used for examples and unit > tests? I am against putting it in the repo > Any suggestions? > > Robin >