I wouldn't spec the worker nodes just to facilitate cloud cost comparison. There's enough variability out there and you'd have to deal with storage, network bandwidth and I/O. Not to mention a similarly spec'd virtual cloud server will never perform as well as a physical server because you don't get data locality. Unless you have something like Amazon's EBS, but then that jacks up your costs. Also, you shouldn't assume that 'big server' will include out-of-band management or redundancy.
Also take into account performance per watt. Dual socket machines do better here. Just like you, I wouldn't go with high ghz ('faster') Intel procs because they are power hungry and generate lots of heat for the incremental speed bump that you get. (After all, you're not building a gaming rig.) However, you can go dual-socket with lower speed processors. I think the lowest ghz Nehalems that support hyper-threading are good value. For example, compare the Xeon 3460 @ 2.8ghz ($360) to the 3440 @ 2.53ghz ($240). That's about a 10% speed bump for a 50% price increase, and that's without factoring in the power consumption. Granted, you need to take into account the cost of the entire server, not just the processor. On Wed, Sep 30, 2009 at 6:46 PM, Kevin Sweeney <ke...@yieldex.com> wrote: > I really appreciate everyone's input. We've been going back and forth on > the > server size issue here. There are a few reasons we shot for the $1k price, > one because we wanted to be able to compare our datacenter costs vs. the > cloud costs. Another is that we have spec'd out a fast Intel node with > over-the-counter parts. We have a hard time justifying the dual-processor > costs and really don't see the need for the big server extras like > out-of-band management and redundancy. This is our proposed config, feel > free to criticize :) > Supermicro 512L-260 Chassis $90 > Supermicro X8SIL $160 > Heatsink $22 > Intel 3460 Xeon $350 > Samsung 7200 RPM SATA2 2x$85 > 2GB Non-ECC DIMM 4x$65 > > This totals $1052. Doesn't this seem like a reasonable setup? Isn't the > purpose of a hadoop cluster to build cheap,fast, replaceable nodes? > > > > On Wed, Sep 30, 2009 at 9:06 PM, Ted Dunning <ted.dunn...@gmail.com> > wrote: > > > 2TB drives are just now dropping to parity with 1TB on a $/GB basis. > > > > If you want space rather than speed, this is a good option. If you want > > speed rather than space, more spindles and smaller disks are better. > > Ironically, 500GB drives now often cost more than 1TB drives (that is $, > > not > > $/GB). > > > > On Wed, Sep 30, 2009 at 7:33 AM, Patrick Angeles > > <patrickange...@gmail.com>wrote: > > > > > We went with 2 x Nehalems, 4 x 1TB drives and 24GB RAM. The ram might > be > > > overkill... but it's DDR3 so you get either 12 or 24GB. Each box has 16 > > > virtual cores so 12GB might not have been enough. These boxes are > around > > > $4k > > > each, but can easily outperform any $1K box dollar per dollar (and > > > performance per watt). > > > > > > If you're extremely I/O bound, you can get single-socket configurations > > > with > > > the same amount of drive spindles for really cheap (~$2k for single > proc, > > > 8-12GB RAM, 4x1TB drives). > > > > > > On Wed, Sep 30, 2009 at 10:19 AM, stephen mulcahy > > > <stephen.mulc...@deri.org>wrote: > > > > > > > Todd Lipcon wrote: > > > > > > > >> Most people building new clusters at this point seem to be leaning > > > towards > > > >> dual quad core Nehalem with 4x1TB 7200RPM SATA and at least 8G RAM. > > > >> > > > > > > > > We went with a similar configuration for a recently purchased cluster > > but > > > > opted for qual quad core Opterons (Shanghai) rather than Nehalems and > > > > invested the difference in more memory per node (16GB). Nehalem seem > to > > > > perform very well on some benchmarks but that performance comes at a > > > > premium. I guess it depends on your planned use of the cluster but in > a > > > lot > > > > of cases more memory may be better spent, especially if you plan on > > > running > > > > things like HBase on the cluster also (which we do). > > > > > > > > -stephen > > > > > > > > -- > > > > Stephen Mulcahy, DI2, Digital Enterprise Research Institute, > > > > NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland > > > > http://di2.deri.ie http://webstar.deri.ie http://sindice.com > > > > > > > > > > > > > > > -- > > Ted Dunning, CTO > > DeepDyve > > > > > > -- > Kevin Sweeney > Systems Engineer > Yieldex -- www.yieldex.com > (303) 999-7045 >