[ 
https://issues.apache.org/jira/browse/HADOOP-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HADOOP-4382:
------------------------------

    Attachment: hadoop-4382.patch

A script that:

1. Launches a cluster on EC2
2. Waits for the cluster and Hadoop daemons to start
3. Runs a small sort job to warm up the cluster
4. Runs a sort job and emits the job duration
5. Terminates the cluster

Running on an 8 node cluster it took 2742 seconds to sort 32GB of data using 
the default hadoop-site.xml that the EC2 scripts use. This could be improved by 
using better settings. 

There are several improvements that could be made to the script, in particular 
in detecting when the cluster is ready to go (the current script waits until 
90% of the nodes are up then waits 1 minute for Hadoop to start). There are 
more ideas here: 
http://www.nabble.com/Auto-shutdown-for-EC2-clusters-td20132561.html It would 
also be good to do multiple runs, discard the first and compute an average.

This should be a good basis for running a regular EC2 benchmark from Hudson.

Comments welcome.

> Run Hadoop sort benchmark on Amazon EC2
> ---------------------------------------
>
>                 Key: HADOOP-4382
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4382
>             Project: Hadoop Core
>          Issue Type: Test
>          Components: contrib/ec2
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: hadoop-4382.patch
>
>
> By running a benchmark on EC2 we can see how well Hadoop performs, how to 
> tune it, and how performance changes between releases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to