Grant, public_p_r.tar seems to be missing? Is that intentional? Maybe some super-secret project inside there :)
Mike On Thu, Oct 14, 2010 at 12:05 PM, Grant Ingersoll <gsing...@apache.org> wrote: > Hi ORPers, > > I put up the complete ASF public mail archives as of about 3 weeks ago on > Amazon's S3 and have made them public (let me know if I messed up, it is the > first time I've done this). I also intend, in the coming weeks, to convert > them into Mahout files (if anyone wants to help let me know). > > There are 5 files: > https://s3.amazonaws.com/asf-mail-archives/public_a_d.tar > https://s3.amazonaws.com/asf-mail-archives/public_e_k.tar > https://s3.amazonaws.com/asf-mail-archives/public_l_o.tar > https://s3.amazonaws.com/asf-mail-archives/public_s_t.tar > https://s3.amazonaws.com/asf-mail-archives/public_u_z.tar > > The tarballs are organized by Top Level Project name (i.e. Mahout is in the > public_l_o.tar file). The tarballs contain GZIP files by date, I believe. I > believe the total uncompressed file size is somewhere in the 80-100GB range. > That should be sufficient to drive some semi-interesting things in terms of > scale, even if it is towards the smaller end of things. > > As the ASF has very clear public mailing list archive policies, it is my > belief that this data set is completely unencumbered. > > From an ORP standpoint, this might make for a first data set for evaluation > once we have the evaluator framework in place. > > Cheers, > Grant > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com > >