Hi Alex, I am also doing a little on HBase - I think I have heard that a few higher memory machines with more spindles and cores per machine beat more smaller machines with similar total capacity (I *guess* this is due to memory buffers and data locality). Please don't take my word for it, but I recommend posting the same to the HBase list - http://hadoop.apache.org/hbase/mailing_lists.html#Users.
Cheers, Tim On Thu, Oct 15, 2009 at 7:54 PM, Allen Wittenauer <awittena...@linkedin.com> wrote: > On 10/15/09 9:42 AM, "Steve Loughran" <ste...@apache.org> wrote: >> It's an interesting Q as to what is better, fewer nodes with more >> storage/CPU or more, smaller nodes. >> >> Bigger servers >> * more chance of running code near the data >> * less data moved over the LAN at shuffle time >> * RAM consumption can be more agile across tasks. >> * increased chance of disk failure on a node; hadoop handles that very >> badly right now (pre 0.20 -datanode goes offline) >> >> Smaller servers >> * easier to place data redundantly across machines >> * less RAM taken up by other people's jobs >> * more nodes stay up when a disk fails (less important on 0.20 onwards) >> * when a node goes down, less data to re-replicate across the other >> machines >> >> 1. I would like to hear other people's opinions, > > - Don't forget the about the more obvious things: if you go with more disks > per server, that also means likely means less controllers doing IO. > > - Keep in mind that fewer CPUs/less RAM=less task slots available. While > your workflow may not be CPU-bound in the traditional sense, if you are > spawning 5000 maps, you're going to need quite a few slots to get your work > done in a reasonable time. > > - To counter that, it seems we can run more tasks-per-node in LI's 2U config > than Y!'s 1U config. But this might be an apples/oranges comparison (LI > uses Solaris+ZFS, Y! uses Linux+ext3). > >> 2. The gridmix 2 benchmarking stuff tries to create synthetic benchmarks >> from your real data runs. Try that, collect some data, then go to your >> suppliers. > > +1 > >