Allen,
Kind of off topic, maybe more appropriately cross topic.
So since we are using Hadoop, HDFS, and HBASE, should there be 3
partitions? One for each system? Or can HBase and HDFS share a
partition. There will be a lot of HBASE data, and also a lot of HDFS
data.
-John
On Nov 3, 2009, at 5:49 PM, Allen Wittenauer wrote:
On 11/3/09 2:29 PM, "John Martyniak" <j...@beforedawnsolutions.com>
wrote:
Would you mind telling me the kinds of configured servers that you
are
running?
Our 'real' grid is comprised of shiny Sun 4275s. But our 'non-real'
grid is
composed of two types of machines with radically different disk
configurations (size *and* number!). Keeping the two different
types of
machines is a bit of a pain. We're going to be replacing that grid
in the
next month or so with a homogeneous config and giving those machines
back to
wherever they came from.
Also have you had any experience running namenodes or zookeeper on a
VM? I have a couple of much larger boxes that are being used to run
VMs, and was thinking of putting both of those on dedicated VM
instances, in order to build redundancy/fault tolerance.
I haven't but I'll admit I've been thinking about it. Especially for
the
JobTracker since it seems to like to fall over if you blow on it. [Of
course, I also have higher expectations of my software stack, much
to the
chagrin of the developers around here. :) ]
In the case of Solaris, we'd use a zone which makes the IO hit
negligible.
But a full blown instance of Xen or VMware or whatever is a bit
scarier.
I'm concerned about the typically slow IO that one can encounter
when VM'ing
a service.
Regarding the dual drive, I wasn't thinking of doing that for
upgradeability, it was more for spindle separation, 1 drive would be
for Hadoop/HDFS etc functions and the other would be for OS
operations, so there would be no contention between the drives, just
on the bus.
This is another spot where "know your workload" comes in. Unless
you are
doing streaming or taxing memory by paging, I suspect your OS disk
is going
to be bored.
So I take your point about the drives and Hadoop/HDFS being able to
handle what was necessary. Since I don't have a pool, I should make
two volumes on one physical drive, something like 750 GB and 750 GB
and dedicate one for HDFS and one for MR.
Waaaaaaaaaaaaaaay too much for MR. But that's the idea. We're
currently
toying with 100GB for MR. Which is still -very- high. [But we
really don't
know our workload that well..... soooo :) ]