Re: How to manage a large cluster?

Steve Loughran Fri, 12 Sep 2008 14:51:41 -0700

James Moore wrote:

On Thu, Sep 11, 2008 at 5:46 AM, Allen Wittenauer <[EMAIL PROTECTED]> wrote:

On 9/11/08 2:39 AM, "Alex Loddengaard" <[EMAIL PROTECTED]> wrote:

I've never dealt with a large cluster, though I'd imagine it is managed the
same way as small clusters:

   Maybe. :)


Depends how often you like to be paged, doesn't it :)

   Instead, use a real system configuration management package such as
bcfg2, smartfrog, puppet, cfengine, etc.  [Steve, you owe me for the plug.
:) ]


Yes Allen, I owe you beer at the next apachecon we are both at.

Actually, I think Y! were one of the sponsors at the UK event, so we oweyou for that too.

Or on EC2 and its competitors, just build a new image whenever you
need to update Hadoop itself.

1. It's still good to have as much automation of your image build as youcan; if you can build new machine images on demand you have havefun/make a mess of things. Look at http://instalinux.com to see the webGUI for creating linux images on demand that is used inside HP.

2. When you try and bring up everything from scratch, you have achoreography problem. DNS needs to be up early, and your authenticationsystem, the management tools, then the other parts of the system. If youhave a project where hadoop is integrated with the front end site, forexample, you're app servers have to stay offline until HDFS is live. Soit does get complex.

3. The Hadoop nodes are good here in that you aren't required to bringup the namenode first; the datanodes will wait; same for the tasktrackers and job tracker. But if you, say, need to point everything at anew hostname for the namenode, well, that's a config change that needsto be pushed out, somehow.




I'm adding some stuff on different ways to deploy hadoop here:

http://wiki.smartfrog.org/wiki/display/sf/Patterns+of+Hadoop+Deployment

-steve

Re: How to manage a large cluster?

Reply via email to