Hey Robin, Couple questions: what is the contents of this sequence file? Is this the output of the SparseVectorsFromSequenceFiles? Do you know the number of key-value pairs, and the cardinality of the rows? Or is this just the <Text,Text> raw contents sequence files?
Also - how do we get access to this bucket if we want to use it too? -jake On Sat, Feb 27, 2010 at 11:30 AM, Robin Anil <robin.a...@gmail.com> wrote: > split the data into 6 chunks of 1.1 GB using deflate compression (GZIP > needs > native lib, so not sure about EMR). Uploading them to mahout-wikipedia > bucket > > @Grant can you ask the rep to make Amazon not charge me :) on this bucket. > My gsoc credits expire in april beginning. I don't want to upload it again. > More credits or an extension would also be fine. I am expecting plenty of > traffic on this one > > Robin >