Hey Robin,

  Couple questions:  what is the contents of this sequence file?  Is this
the
output of the SparseVectorsFromSequenceFiles?  Do you know the number
of key-value pairs, and the cardinality of the rows?  Or is this just the
<Text,Text> raw contents sequence files?

  Also - how do we get access to this bucket if we want to use it too?

  -jake

On Sat, Feb 27, 2010 at 11:30 AM, Robin Anil <robin.a...@gmail.com> wrote:

> split the data into 6 chunks of 1.1 GB using deflate compression (GZIP
> needs
>  native lib, so not sure about EMR). Uploading them to mahout-wikipedia
> bucket
>
> @Grant can you ask the rep to make Amazon not charge me :) on this bucket.
> My gsoc credits expire in april beginning. I don't want to upload it again.
> More credits or an extension would also be fine. I am expecting plenty of
> traffic on this one
>
> Robin
>

Reply via email to