[ 
https://issues.apache.org/jira/browse/HADOOP-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HADOOP-2409:
------------------------------

    Attachment: hadoop-2409.patch

I've written a patch to make it easy to install Hadoop on startup, while still 
supporting builds of AMIs that have it pre-installed.

* The default behavior is to download the specified version of Hadoop from S3 
and install it on startup. (Hadoop releases will be mirrored in a publicly 
readable S3 bucket.) The download takes around 1 second, so there is no 
performance overhead in doing this. This makes it easy for people to use 
patched versions of Hadoop by uploading them to their own S3 bucket and 
changing a config setting.
* The scripts for creating an AMI retain the ability to install Hadoop so you 
don't have to install it on start up. I propose that the publicly available 
AMIs do not have Hadoop installed, and install it on start up, so they don't 
have to be regenerated on each Hadoop release. This also makes it feasible to 
install other Hadoop software in the future, and avoid the combinatorial 
explosion of version numbers.
* I've created AMIs that work with these changes 
(hadoop-images/hadoop-base-20090210-i386.manifest.xml and 
hadoop-images/hadoop-base-20090210-x86_64.manifest.xml)
* I have also fixed HADOOP-5006 as a part of this change, which allowed me to 
see Ganglia pages.


> Make EC2 image independent of Hadoop version
> --------------------------------------------
>
>                 Key: HADOOP-2409
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2409
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/ec2
>            Reporter: Tom White
>         Attachments: hadoop-2409.patch, HADOOP-2409.patch
>
>
> Instead of building a new image for each released version of Hadoop, install 
> Hadoop on instance start up. Since it is a small download this would not add 
> significantly to startup time. Hadoop releases would be mirrored on S3 for 
> scalability (and to avoid bandwidth costs). The version to install would be 
> found from the instance metadata - this would be a download URL. 
> More generally, the instance could retrieve a script to run on start up from 
> a URL specified in the metadata. The script would install and configure 
> Hadoop, but it could be extended to do cluster-specific set up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to