James Moore wrote:
On Thu, Sep 11, 2008 at 5:46 AM, Allen Wittenauer <[EMAIL PROTECTED]> wrote:
On 9/11/08 2:39 AM, "Alex Loddengaard" <[EMAIL PROTECTED]> wrote:
I've never dealt with a large cluster, though I'd imagine it is managed the
same way as small clusters:
Maybe. :)
Depends how often you like to be paged, doesn't it :)
Instead, use a real system configuration management package such as
bcfg2, smartfrog, puppet, cfengine, etc. [Steve, you owe me for the plug.
:) ]
Yes Allen, I owe you beer at the next apachecon we are both at.
Actually, I think Y! were one of the sponsors at the UK event, so we owe
you for that too.
Or on EC2 and its competitors, just build a new image whenever you
need to update Hadoop itself.
1. It's still good to have as much automation of your image build as you
can; if you can build new machine images on demand you have have
fun/make a mess of things. Look at http://instalinux.com to see the web
GUI for creating linux images on demand that is used inside HP.
2. When you try and bring up everything from scratch, you have a
choreography problem. DNS needs to be up early, and your authentication
system, the management tools, then the other parts of the system. If you
have a project where hadoop is integrated with the front end site, for
example, you're app servers have to stay offline until HDFS is live. So
it does get complex.
3. The Hadoop nodes are good here in that you aren't required to bring
up the namenode first; the datanodes will wait; same for the task
trackers and job tracker. But if you, say, need to point everything at a
new hostname for the namenode, well, that's a config change that needs
to be pushed out, somehow.
I'm adding some stuff on different ways to deploy hadoop here:
http://wiki.smartfrog.org/wiki/display/sf/Patterns+of+Hadoop+Deployment
-steve