James Moore wrote:
On Thu, Sep 11, 2008 at 5:46 AM, Allen Wittenauer <[EMAIL PROTECTED]> wrote:
On 9/11/08 2:39 AM, "Alex Loddengaard" <[EMAIL PROTECTED]> wrote:
I've never dealt with a large cluster, though I'd imagine it is managed the
same way as small clusters:
   Maybe. :)

Depends how often you like to be paged, doesn't it :)


   Instead, use a real system configuration management package such as
bcfg2, smartfrog, puppet, cfengine, etc.  [Steve, you owe me for the plug.
:) ]

Yes Allen, I owe you beer at the next apachecon we are both at.
Actually, I think Y! were one of the sponsors at the UK event, so we owe you for that too.


Or on EC2 and its competitors, just build a new image whenever you
need to update Hadoop itself.


1. It's still good to have as much automation of your image build as you can; if you can build new machine images on demand you have have fun/make a mess of things. Look at http://instalinux.com to see the web GUI for creating linux images on demand that is used inside HP.

2. When you try and bring up everything from scratch, you have a choreography problem. DNS needs to be up early, and your authentication system, the management tools, then the other parts of the system. If you have a project where hadoop is integrated with the front end site, for example, you're app servers have to stay offline until HDFS is live. So it does get complex.

3. The Hadoop nodes are good here in that you aren't required to bring up the namenode first; the datanodes will wait; same for the task trackers and job tracker. But if you, say, need to point everything at a new hostname for the namenode, well, that's a config change that needs to be pushed out, somehow.



I'm adding some stuff on different ways to deploy hadoop here:

http://wiki.smartfrog.org/wiki/display/sf/Patterns+of+Hadoop+Deployment

-steve

Reply via email to