Can't that be sharded? I'm accustomed to any input file being a series of files... *-00000, *-00001, etc. This also assists distributing input to mappers, anyhow, regardless of size caps.
Can't you just claim the "mahout" bucket? On Sat, Feb 27, 2010 at 4:53 PM, Robin Anil <robin.a...@gmail.com> wrote: > Bah! Humbug! S3 has a 5Gb limit and wikipedia compressed seq file is 5.9GB > > On Sat, Feb 27, 2010 at 10:08 PM, Robin Anil <robin.a...@gmail.com> wrote: > >> Just curious. Was trying to put my wikipedia seqfiles for public >> consumption. I can put it on robinanil bucket. mahout would have been nicer. >> >> >> Robin >> >