Allen,

Kind of off topic, maybe more appropriately cross topic.

So since we are using Hadoop, HDFS, and HBASE, should there be 3 partitions? One for each system? Or can HBase and HDFS share a partition. There will be a lot of HBASE data, and also a lot of HDFS data.

-John

On Nov 3, 2009, at 5:49 PM, Allen Wittenauer wrote:

On 11/3/09 2:29 PM, "John Martyniak" <j...@beforedawnsolutions.com> wrote:
Would you mind telling me the kinds of configured servers that you are
running?

Our 'real' grid is comprised of shiny Sun 4275s. But our 'non-real' grid is
composed of two types of machines with radically different disk
configurations (size *and* number!). Keeping the two different types of machines is a bit of a pain. We're going to be replacing that grid in the next month or so with a homogeneous config and giving those machines back to
wherever they came from.

Also have you had any experience running namenodes or zookeeper on a
VM?  I have a couple of much larger boxes that are being used to run
VMs, and was thinking of putting both of those on dedicated VM
instances, in order to build redundancy/fault tolerance.

I haven't but I'll admit I've been thinking about it. Especially for the
JobTracker since it seems to like to fall over if you blow on it. [Of
course, I also have higher expectations of my software stack, much to the
chagrin of the developers around here. :) ]

In the case of Solaris, we'd use a zone which makes the IO hit negligible. But a full blown instance of Xen or VMware or whatever is a bit scarier. I'm concerned about the typically slow IO that one can encounter when VM'ing
a service.

Regarding the dual drive, I wasn't thinking of doing that for
upgradeability, it was more for spindle separation, 1 drive would be
for Hadoop/HDFS etc functions and the other would be for OS
operations, so there would be no contention between the drives, just
on the bus.

This is another spot where "know your workload" comes in. Unless you are doing streaming or taxing memory by paging, I suspect your OS disk is going
to be bored.

So I take your point about the drives and Hadoop/HDFS being able to
handle what was necessary.  Since I don't have a pool, I should make
two volumes on one physical drive, something like 750 GB and 750 GB
and dedicate one for HDFS and one for MR.

Waaaaaaaaaaaaaaay too much for MR. But that's the idea. We're currently toying with 100GB for MR. Which is still -very- high. [But we really don't
know our workload that well..... soooo :) ]


Reply via email to