There's a case study with some numbers in it from a presentation I
gave on Hadoop and AWS in London last month, which you may find
interesting: http://skillsmatter.com/custom/presentations/ec2-talk.pdf.

tim robertson <[EMAIL PROTECTED]> wrote:
> For these small
> datasets, you might find it useful - let me know if I should spend
> time finishing it (Or submit help?) - it is really very simple.

This sounds very useful. Please consider creating a Jira and
submitting the code (even if it's not "finished" folks might like to
see it). Thanks.

Tom

>
> Cheers
>
> Tim
>
>
>
> On Tue, Sep 2, 2008 at 2:22 PM, Ryan LeCompte <[EMAIL PROTECTED]> wrote:
>> Hi Tim,
>>
>> Are you mostly just processing/parsing textual log files? How many
>> maps/reduces did you configure in your hadoop-ec2-env.sh file? How
>> many did you configure in your JobConf? Just trying to get an idea of
>> what to expect in terms of performance. I'm noticing that it takes
>> about 16 minutes to transfer about 15GB of textual uncompressed data
>> from S3 into HDFS after the cluster has started with 15 nodes. I was
>> expecting this to take a shorter amount of time, but maybe I'm
>> incorrect in my assumptions. I am also noticing that it takes about 15
>> minutes to parse through the 15GB of data with a 15 node cluster.
>>
>> Thanks,
>> Ryan
>>
>>
>> On Tue, Sep 2, 2008 at 3:29 AM, tim robertson <[EMAIL PROTECTED]> wrote:
>>> I have been processing only 100s GBs on EC2, not 1000's and using 20
>>> nodes and really only in exploration and testing phase right now.
>>>
>>>
>>> On Tue, Sep 2, 2008 at 8:44 AM, Andrew Hitchcock <[EMAIL PROTECTED]> wrote:
>>>> Hi Ryan,
>>>>
>>>> Just a heads up, if you require more than the 20 node limit, Amazon
>>>> provides a form to request a higher limit:
>>>>
>>>> http://www.amazon.com/gp/html-forms-controller/ec2-request
>>>>
>>>> Andrew
>>>>
>>>> On Mon, Sep 1, 2008 at 10:43 PM, Ryan LeCompte <[EMAIL PROTECTED]> wrote:
>>>>> Hello all,
>>>>>
>>>>> I'm curious to see how many people are using EC2 to execute their
>>>>> Hadoop cluster and map/reduce programs, and how many are using
>>>>> home-grown datacenters. It seems like the 20 node limit with EC2 is a
>>>>> bit crippling when one wants to process many gigabytes of data. Has
>>>>> anyone found this to be the case? How much data are people processing
>>>>> with their 20 node limit on EC2? Curious what the thoughts are...
>>>>>
>>>>> Thanks,
>>>>> Ryan
>>>>>
>>>>
>>>
>>
>

Reply via email to