On 9/11/08 2:39 AM, "Alex Loddengaard" <[EMAIL PROTECTED]> wrote: > I've never dealt with a large cluster, though I'd imagine it is managed the > same way as small clusters:
Maybe. :) > -Use hostnames or ips, whichever is more convenient for you Use hostnames. Seriously. Who are you people using raw IPs for things? :) Besides, you're going to need it for the eventual support of Kerberos. > -All the slaves need to go into the slave file We only put this file on the namenode and 2ndary namenode to prevent accidents. > -You can update software by using bin/hadoop-daemons.sh. Something like: > #bin/hadoop-daemons.sh "rsync (mastersrcpath) (localdestpath)" We don't use that because it doesn't take inconsideration down nodes (and you *will* have down nodes!) or deal with nodes that are outside the grid (such as our gateways/bastion hosts, data loading machines, etc). Instead, use a real system configuration management package such as bcfg2, smartfrog, puppet, cfengine, etc. [Steve, you owe me for the plug. :) ] > I created a wiki page that currently contains one tip for managing large > clusters. Could others add to this wiki page? > > <http://wiki.apache.org/hadoop/LargeClusterTips> Quite a bit of what we do is covered in the latter half of http://tinyurl.com/5foamm . This is a presentation I did at ApacheCon EU this past April that included some of the behind-the-scenes of the large clusters at Y!. At some point I'll probably do an updated version that includes more adminy things (such as why we push four different types of Hadoop configurations per grid) while others talk about core Hadoop stuff.