split the data into 6 chunks of 1.1 GB using deflate compression (GZIP needs native lib, so not sure about EMR). Uploading them to mahout-wikipedia bucket
@Grant can you ask the rep to make Amazon not charge me :) on this bucket. My gsoc credits expire in april beginning. I don't want to upload it again. More credits or an extension would also be fine. I am expecting plenty of traffic on this one Robin