The below from Patrick is not uncommon to encounter.

The "commodity hardware" talk around MR and BigTable is a bit of a joke -- you 
can do that if you can afford 1,000s or 10,000s of commodity components custom 
assembled. Hadoop+HBase users want to do more with less, obviously. Colocating 
computation with storage has its price -- either you horizontally scale wide or 
go vertical enough on each node to handle the load you are throwing at the 
cluster you can afford. 

Sizing clusters is a black art. 

As for the spec of each individual node, I can share our current generation 
hardware spec:

  CPU: dual 6-core AMD (12 cores total)
  RAM: 32 GB
  DISK: 320 GB x 2 (RAID-1) system disk
        500 GB x 8 (JBOD) data disks for HDFS
  custom 1U chassis

  We give 8 GB of RAM to the HBase region servers. All other Hadoop and HBase 
daemons (DataNode, ZooKeeper, TaskTracker, etc.) use the default of 1 GB. 
Remainder of CPU and RAM is for user tasks (MR).

  Reads are best served from RAM via the block cache.

  The more spindles, the higher I/O parallelism, therefore higher aggregate 
throughput.

  The above is a good trade off between horizontal and vertical for us.

Hope that helps.

> From: Patrick Hunt
> Subject: Re: About test/production server configuration
> The ZK servers are sensitive to disk
> (io) latency. I just troubleshot an
> issue last week where a user was seeing 80second (second!)
> latencies. Turns out they were running zk server, namenode,
> tasktracker, and hbase region server all on the same box, 
> that box had a single spindle for all io activity and was
> at 100% utilization for long periods of time. If
> you want decent ZK API latencies (<100ms) you really
> need to ensure that there's at least a separate spindle
> available for the ZK transaction logs.




Reply via email to