Combining hadoop dataNode process with hbase into single JVM process

Michał Podsiadłowski Thu, 10 Dec 2009 15:26:36 -0800

Hi all!

Sorry for duplicating message from hadoop list but I think not all of you
are reading that one and I really need to know your opinion.


I have been recently experimenting with hadoop and hbase for my company. And
after some tests we decided to set up a 4 node cluster for our application
with hbase as a persistence layer for our webcache as a proove of concept +
some more elaborated usages planed for future if this will happen to work.
 Unfortunately our IT department would like to have whole hadoop + hbase +
zookeeper + ..  as one jvm process to be able to monitor it easily and
what's more important limit max memory more accurately. They are afraid that
with multiple processes each JVM can go slightly beyond their limits as it
tends to do and node will became less stable (swapping and etc) as it would
be with one process limited to sum of all bounds. We are unfortunately
limited by resources and on dataNodes there will be also some other app
running, so hadoop can't use all resources which are though quite generous -
8GB.
Is it sensible to connect those usually separated process and what can
potential go wrong when all there processes will be started programatically
in one JVM? Are there any obvious contraindication for not going this way?

We are planning to setup for our namenode failover as described by cloudera
using linux ha features which are used currently for mysql replication and
failover - unfortunately again
namenode won't be alone on the machine it will be sharing it with mysql but
those 2 nodes are better equipped - 10GB. Here comes another question -  can
some advice me how much memory is required for NameNode baring in mind that
our data won't exceed for sure ~300GB (again this is only a prove of concept
for now).

Thanks,
Michael.

Combining hadoop dataNode process with hbase into single JVM process

Reply via email to