Re: Cluster Machines

John Martyniak Wed, 04 Nov 2009 08:16:01 -0800

Allen,

Kind of off topic, maybe more appropriately cross topic.

So since we are using Hadoop, HDFS, and HBASE, should there be 3partitions? One for each system? Or can HBase and HDFS share apartition. There will be a lot of HBASE data, and also a lot of HDFSdata.


-John

On Nov 3, 2009, at 5:49 PM, Allen Wittenauer wrote:

On 11/3/09 2:29 PM, "John Martyniak" <j...@beforedawnsolutions.com>wrote:
Would you mind telling me the kinds of configured servers that youare
running?
Our 'real' grid is comprised of shiny Sun 4275s. But our 'non-real'grid is
composed of two types of machines with radically different disk
configurations (size *and* number!). Keeping the two differenttypes ofmachines is a bit of a pain. We're going to be replacing that gridin thenext month or so with a homogeneous config and giving those machinesback to
wherever they came from.
Also have you had any experience running namenodes or zookeeper on a
VM?  I have a couple of much larger boxes that are being used to run
VMs, and was thinking of putting both of those on dedicated VM
instances, in order to build redundancy/fault tolerance.
I haven't but I'll admit I've been thinking about it. Especially forthe
JobTracker since it seems to like to fall over if you blow on it. [Of
course, I also have higher expectations of my software stack, muchto the
chagrin of the developers around here. :) ]
In the case of Solaris, we'd use a zone which makes the IO hitnegligible.But a full blown instance of Xen or VMware or whatever is a bitscarier.I'm concerned about the typically slow IO that one can encounterwhen VM'ing
a service.
Regarding the dual drive, I wasn't thinking of doing that for
upgradeability, it was more for spindle separation, 1 drive would be
for Hadoop/HDFS etc functions and the other would be for OS
operations, so there would be no contention between the drives, just
on the bus.
This is another spot where "know your workload" comes in. Unlessyou aredoing streaming or taxing memory by paging, I suspect your OS diskis going
to be bored.
So I take your point about the drives and Hadoop/HDFS being able to
handle what was necessary.  Since I don't have a pool, I should make
two volumes on one physical drive, something like 750 GB and 750 GB
and dedicate one for HDFS and one for MR.
Waaaaaaaaaaaaaaay too much for MR. But that's the idea. We'recurrentlytoying with 100GB for MR. Which is still -very- high. [But wereally don't
know our workload that well..... soooo :) ]

Re: Cluster Machines

Reply via email to