Kevin Sweeney wrote:
I really appreciate everyone's input. We've been going back and forth on the
server size issue here. There are a few reasons we shot for the $1k price,
one because we wanted to be able to compare our datacenter costs vs. the
cloud costs. Another is that we have spec'd out a fast Intel node with
over-the-counter parts. We have a hard time justifying the dual-processor
costs and really don't see the need for the big server extras like
out-of-band management and redundancy. This is our proposed config, feel
free to criticize :)
Supermicro 512L-260 Chassis $90
Supermicro X8SIL $160
Heatsink $22
Intel 3460 Xeon $350
Samsung 7200 RPM SATA2 2x$85
2GB Non-ECC DIMM 4x$65
This totals $1052. Doesn't this seem like a reasonable setup? Isn't the
purpose of a hadoop cluster to build cheap,fast, replaceable nodes?
Disclaimer 1: I work for a server vendor so may be biased. I will
attempt to avoid this by not pointing you at HP DL180 or SL170z servers.
Disclaimer 2: I probably don't know what I'm talking about. As far as
Hadoop concerned, I'm not sure anyone knows what is "the right"
configuration.
* I'd consider ECC RAM. On a large cluster, over time, errors occur -you
either notice them or propagate the effects.
* Worry about power, cooling and rack weight.
* Include network costs, power budget. That's your own switch costs,
plus bandwidth in and out.
* There are some good arguments in favour of fewer, higher end machines
over many smaller ones. Less network traffic, often a higher density.
The cloud hosted vs owned is an interesting question; I suspect the
spreadsheet there is pretty complex
* Estimate how much data you will want to store over time. On S3, those
costs ramp up fast; in your own rack you can maybe plan to stick in in
an extra 2TB HDD a year from now (space, power, cooling and weight
permitting), paying next year's prices for next year's capacity.
* Virtual machine management costs are different from physical
management costs, especially if you dont invest time upfront on
automating your datacentre software provisioning (custom RPMs, PXE
preboot, kickstart, etc). VMMs you can almost hand manage an image
(naughty, but possible), as long as you have a single image or two to
push out. Even then, i'd automate, but at a higher level, creating
images on demand as load/availablity sees fit.
-Steve