On Fri, Feb 26, 2010 at 1:36 AM, Robin Anil <robin.a...@gmail.com> wrote: > > My mind was wandering and was thinking of giving the record attempt a better > purpose than just creating junk ngram data(its good enough for a record > attempt) > There are a couple of datasets we can explore, like the genome dataset.
Another interesting dataset is the wikipedia page traffic stats dataset: http://www.datawrangling.com/wikipedia-page-traffic-statistics-dataset I wonder if there's something interesting that can be done with that and the frequent pattern mining code. One advantage to this is that it's already on ec2.