Hi all! Sorry for duplicating message from hadoop list but I think not all of you are reading that one and I really need to know your opinion.
I have been recently experimenting with hadoop and hbase for my company. And after some tests we decided to set up a 4 node cluster for our application with hbase as a persistence layer for our webcache as a proove of concept + some more elaborated usages planed for future if this will happen to work. Unfortunately our IT department would like to have whole hadoop + hbase + zookeeper + .. as one jvm process to be able to monitor it easily and what's more important limit max memory more accurately. They are afraid that with multiple processes each JVM can go slightly beyond their limits as it tends to do and node will became less stable (swapping and etc) as it would be with one process limited to sum of all bounds. We are unfortunately limited by resources and on dataNodes there will be also some other app running, so hadoop can't use all resources which are though quite generous - 8GB. Is it sensible to connect those usually separated process and what can potential go wrong when all there processes will be started programatically in one JVM? Are there any obvious contraindication for not going this way? We are planning to setup for our namenode failover as described by cloudera using linux ha features which are used currently for mysql replication and failover - unfortunately again namenode won't be alone on the machine it will be sharing it with mysql but those 2 nodes are better equipped - 10GB. Here comes another question - can some advice me how much memory is required for NameNode baring in mind that our data won't exceed for sure ~300GB (again this is only a prove of concept for now). Thanks, Michael.
