Hi All,
I am planning to run amplab benchmark suite to evaluate the performance of our 
cluster. I looked at: https://amplab.cs.berkeley.edu/benchmark/ and it mentions 
about data avallability at:
s3n://big-data-benchmark/pavlo/[text|text-deflate|sequence|sequence-snappy]/[suffix]where
 /tiny/, /1node/ and /5nodes/ are options for suffix. However, I am not able to 
doanload these datasets directly. Here is what I see. I read that they can be 
used directly by doing : sc.textFile(s3:/....). However, I wanted to make sure 
that my understanding is correct. Here is what I see at 
http://s3.amazonaws.com/big-data-benchmark/
I do not see anything for sequence or text-deflate.
I see sequence-snappy dataset:
<Contents><Key>pavlo/sequence-snappy/5nodes/crawl/000738_0</Key><LastModified>2013-05-27T21:26:40.000Z</LastModified><ETag>"a978d18721d5a533d38a88f558461644"</ETag><Size>42958735</Size><StorageClass>STANDARD</StorageClass></Contents>
For text, I get the following error:
<Error><Code>NoSuchKey</Code><Message>The specified key does not 
exist.</Message><Key>pavlo/text/1node/crawl</Key><RequestId>166D239D38399526</RequestId><HostId>4Bg8BHomWqJ6BXOkx/3fQZhN5Uw1TtCn01uQzm+1qYffx2s/oPV+9sGoAWV2thCI</HostId></Error>

Please let me know if there is a way to readily download the dataset and view 
it.                                         

Reply via email to