Kevin Sweeney wrote:
I really appreciate everyone's input. We've been going back and forth on the
server size issue here. There are a few reasons we shot for the $1k price,
one because we wanted to be able to compare our datacenter costs vs. the
cloud costs. Another is that we have spec'd out a fast Intel node with
over-the-counter parts. We have a hard time justifying the dual-processor
costs and really don't see the need for the big server extras like
out-of-band management and redundancy. This is our proposed config, feel
free to criticize :)
Supermicro 512L-260 Chassis $90
Supermicro X8SIL                  $160
Heatsink                                $22
Intel 3460 Xeon                      $350
Samsung 7200 RPM SATA2   2x$85
2GB Non-ECC DIMM              4x$65

This totals $1052. Doesn't this seem like a reasonable setup? Isn't the
purpose of a hadoop cluster to build cheap,fast, replaceable nodes?

Disclaimer 1: I work for a server vendor so may be biased. I will attempt to avoid this by not pointing you at HP DL180 or SL170z servers.

Disclaimer 2: I probably don't know what I'm talking about. As far as Hadoop concerned, I'm not sure anyone knows what is "the right" configuration.

* I'd consider ECC RAM. On a large cluster, over time, errors occur -you either notice them or propagate the effects.

* Worry about power, cooling and rack weight.

* Include network costs, power budget. That's your own switch costs, plus bandwidth in and out.

* There are some good arguments in favour of fewer, higher end machines over many smaller ones. Less network traffic, often a higher density.

The cloud hosted vs owned is an interesting question; I suspect the spreadsheet there is pretty complex

* Estimate how much data you will want to store over time. On S3, those costs ramp up fast; in your own rack you can maybe plan to stick in in an extra 2TB HDD a year from now (space, power, cooling and weight permitting), paying next year's prices for next year's capacity.

* Virtual machine management costs are different from physical management costs, especially if you dont invest time upfront on automating your datacentre software provisioning (custom RPMs, PXE preboot, kickstart, etc). VMMs you can almost hand manage an image (naughty, but possible), as long as you have a single image or two to push out. Even then, i'd automate, but at a higher level, creating images on demand as load/availablity sees fit.

-Steve


Reply via email to