Its uploaded here and its public. I will monitor usage and see if my credits dont get run out easily, then i will take it down and wait for amazon to give me more credits.
Its wikipedia docid => wikitext. You can run the vectorizer over this. Use either the wikipedia analyzer or the standard analyzer. http://mahout-wikipedia.s3.amazonaws.com/wikipedia-jan-2010-seqfile-deflate-chunk-[0-5] s3://mahout-wikipedia/wikipedia-jan-2010-seqfile-deflate-chunk-[0-5] Robin