ahaa, as i suspected. Look at the end.. its a regex. there are 6 chunks
On Sun, Feb 28, 2010 at 3:04 AM, Jake Mannix <jake.man...@gmail.com> wrote: > Er, the one you posted! > > > > http://mahout-wikipedia.s3.amazonaws.com/wikipedia-jan-2010-seqfile-deflate-chunk-[0-5] > > < > http://mahout-wikipedia.s3.amazonaws.com/wikipedia-jan-2010-seqfile-deflate-chunk-[0-5] > > > > > On Sat, Feb 27, 2010 at 1:30 PM, Robin Anil <robin.a...@gmail.com> wrote: > > > Can you give the url you tried > > > > > > On Sun, Feb 28, 2010 at 2:59 AM, Jake Mannix <jake.man...@gmail.com> > > wrote: > > > > > Hey Robin, that http url gives me a permission denied response... I'm > not > > > too S3 savvy, not sure if I'm checking on it right... > > > > > > On Sat, Feb 27, 2010 at 12:40 PM, Robin Anil <robin.a...@gmail.com> > > wrote: > > > > > > > Its uploaded here and its public. I will monitor usage and see if my > > > > credits > > > > dont get run out easily, then i will take it down and wait for amazon > > to > > > > give me more credits. > > > > > > > > Its wikipedia docid => wikitext. You can run the vectorizer over > this. > > > Use > > > > either the wikipedia analyzer or the standard analyzer. > > > > > > > > > > > > > > > > > > http://mahout-wikipedia.s3.amazonaws.com/wikipedia-jan-2010-seqfile-deflate-chunk-[0-5] > > > > s3://mahout-wikipedia/wikipedia-jan-2010-seqfile-deflate-chunk-[0-5] > > > > > > > > > > > > Robin > > > > > > > > > >