On Nov 2, 2011, at 1:01 PM, Jake Mannix wrote:

> On Wed, Nov 2, 2011 at 5:36 AM, Grant Ingersoll <gsing...@apache.org> wrote:
> 
>> 
>> Alternatively, the ASF email data is license free.  We could take and use
>> a chunk of that.  You can pretty much have as much or as little as you
>> want.  Since it's broken down by project, it has the rough look and feel of
>> 20newsgroups at much bigger scale.
>> 
> 
> I like it.  Where does that data live, can I download it easily?

Whole Enchilada on EC2: http://aws.amazon.com/datasets/7791434387204566

Small subset at: 
http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout

You can also log into people.a.o with your ASF creds and grab some of the 
public sets there.   I think there is a link somewhere else to download the 
actual mbox files, but I can't find it at the moment.  I can send you the 
location off list if you want.

Reply via email to